### Optimization and Learning Approaches for Energy Harvesting Wireless Communication Systems

Emerging technologies such as Internet of Things (IoT) and Industry 4.0 are now possible thanks to the advances in wireless sensor networks. In such applications, the wireless communication nodes play a key role because they provide the connection between different sensors as well as the communication to the outside world. In general, these wireless communication nodes are battery operated. However, depending on the specific application, charging or replacing the batteries can be too expensive or even infeasible, e.g., when the nodes are located in remote locations or inside structures. Therefore, in order to provide sustainable service and to reduce the operation expenses, energy harvesting (EH) has been considered as a promising technology in which the nodes collect energy from the environment using natural or man-made energy sources such as solar or electromagnetic radiation. The idea behind EH is that the wireless communication nodes can recharge their batteries while in idle mode or while transmitting data to neighboring nodes. As a result, the lifetime of the wireless communication network is not limited by the availability of energy.

The consideration of EH brings new challenges in the design of transmission policies. This is because in addition to the uctuating channel conditions and data arrival processes, the variability of the amount of energy available for the communication should be taken into account. Moreover, the three processes, EH, data arrival and channel fading, should be jointly considered in order to achieve optimum performance. In this context, this dissertation contributes to the research on EH wireless communication networks by considering power allocation and resource allocation problems in four different scenarios, namely, EH point-to-point, EH two-hop, EH broadcast and EH multiple access, which are the fundamental constituents of more complicated networks. Speciffically, we determine the optimal allocation policies and the corresponding upper bounds of the achievable performance by considering offline approaches in which noncausal knowledge regarding system dynamics, i.e., the EH, data arrival and channel fading processes, is assumed. Furthermore, we overcome this unrealistic assumption by developing novel learning approaches, based on reinforcement learning, under the practical assumption that only causal knowledge of the system dynamics is available.

First, we focus on the EH point-to-point scenario where an EH transmitter sends data to a receiver. For this scenario, we formulate the power allocation problem for throughput maximization considering not only the transmit power, but also the energy consumed by the circuit. Adopting an offline approach, we characterize the optimum power allocation policy and exploit this analysis in the development of a learning approach. Speciffically, we develop a novel learning algorithm which considers a realistic EH point-to-point scenario, i.e., only causal knowledge of the system dynamics is assumed to be available. For the proposed learning algorithm, we exploit linear function approximation to cope with the infinite number of values the harvested energy, the incoming data and the channel coefficients can take. In particular, we propose four feature functions which are inspired by the characteristics of the problem and the insights gained from the offline approach. Through numerical simulations, we show that the proposed learning approach achieves a performance close to the offline optimum without the requirement of non-causal knowledge of the system dynamics. Moreover, it can achieve a performance up to 50% higher than the performance of reference learning schemes such as Q-learning, which do not exploit the characteristics of the problem.

Secondly, we investigate an EH two-hop scenario in which an EH transmitter communicates with a receiver via an EH relay. For this purpose, we consider the main relaying strategies, namely, decode-and-forward and amplify-and-forward. Furthermore, we consider both, the transmit power and the energy consumed by the circuit in each of the EH nodes. For the EH decode-and-forward relay, we formulate the power allocation problem for throughput maximization and consider an offline approach to find the optimum power allocation policy. We show that the optimal power allocation policies of both nodes, transmitter and relay, depend on each other. Additionally, following a learning approach, we investigate a more realistic scenario in which the EH transmitter and the EH decode-and-forward relay have only partial and causal knowledge about the system dynamics, i.e., each node has only causal knowledge about the EH, data arrival and channel fading processes associated to it. To this aim, two novel learning algorithms are proposed which take into account whether or not the EH nodes cooperate with each other to improve their learning processes. For the cooperative case, we propose the inclusion of a signaling phase in which the EH nodes exchange their current parameters. Through numerical simulations, we show that by providing the nodes with a complete view of the system state in a signaling phase, a performance gain of up to 40% can be achieved compared to the case when no cooperation is considered. Following a similar procedure, we investigate the EH two-hop scenario with an EH amplify-and-forward relay. We show that the resulting power allocation problem for throughput maximization is non-convex. Consequently, we propose an offline approach based on a branch-and-bound algorithm tailored to the EH two-hop scenario to find the optimal power allocation policy. Additionally, a centralized learning algorithm is proposed for the realistic case in which only causal knowledge of the system dynamics is available. The proposed learning approach exploits the fact that, with an amplify-andforward relay, the communication between the transmitter and the receiver depends in a single effective channel, which is composed of the link between the transmitter and the relay, the relay gain and the channel from the relay to the receiver. By means of numerical simulations, we show that the proposed learning algorithm achieves a performance up to two times higher than the performance achieved by reference schemes. Additionally, the extension of the proposed approaches to EH multi-hop scenarios is discussed.

Thirdly, an EH broadcast scenario in which an EH transmitter sends individual data to multiple receivers is studied. We show that the power allocation problem for throughput maximization in this scenario leads to a non-convex problem when an arbitrary number of receivers is considered. However, following an offline approach we find the optimal power allocation policy for the special case when two receivers are considered. Furthermore, inspired by the offline approach for two users, a novel learning approach which does not pose any restriction on the number of receiver nodes is developed. The proposed learning approach is a two-stage learning algorithm which separates the learning task into two subtasks: determining how much power to use in each time interval and deciding how to split this selected power for the transmission of the individual data intended for each receiver. Through numerical simulations, we show that the separation of tasks leads to a performance up to 40% higher than the one achieved by standard learning techniques, specially for large numbers of receivers.

Finally, an EH multiple access scenario is considered in which multiple EH transmitters communicate with a single receiver using multiple orthogonal resources. In this case, the focus is on the formulation of the resource allocation problem considering the EH processes at the different transmitters. We show that the resulting resource allocation problem falls into the category of non-linear knapsack problems which are known to be NP-hard. Therefore, we propose an offline approach based on dynamic programming to find the optimal solution. Furthermore, by exploiting the characteristics of the scenario, a novel learning approach is proposed which breaks the original resource allocation problem into smaller subproblems. As a result, it is able to handle the exponential growth of the space of possible solutions when the network size increases. Through numerical simulations, we show that in contrast to conventional reinforcement learning algorithms, the proposed learning approach is able to find the resource allocation policy that aims at maximizing the throughput when the network size is large. Furthermore, it achieves a performance up to 25% higher than the performance of the greedy policy that allocates the resources to the users with the best channel conditions.

Additionally, in order to carry out a full assessment of the proposed learning algorithms, we provide convergence guarantees and a computational complexity analysis for all the developed learning approaches in the four considered scenarios.