Li K., Ni W., Tovar E., Guizani M.

IEEE Internet of Things Journal


Employing Unmanned Aerial Vehicles (UAVs) as aerial data collectors in Internet-of-Things (IoT) networks is a promising technology for large-scale environment sensing. A key challenge in UAV-aided data collection is that UAV maneuvering gives rise to buffer overflow at the IoT node and unsuccessful transmission due to lossy airborne channels. This paper formulates a joint optimization of flight cruise control and data collection schedule to minimize network data loss as a Partial Observable Markov Decision Process (POMDP), where the states of individual IoT nodes can be obscure to the UAV. The problem can be optimally solvable by reinforcement learning, but suffers from curse-of-dimensionality and becomes rapidly intractable with the growth in the number of IoT nodes. In practice, a UAV-aided IoT network contains a large number of network states and actions in POMDP while the up-to-date knowledge is not available at the UAV. We propose an onboard Deep Q-Network based Flight Resource Allocation Scheme (DQN-FRAS) to optimize the online flight cruise control of the UAV and data scheduling given outdated knowledge on the network states. Numerical results demonstrate that DQN-FRAS reduces the packet loss by over 51%, as compared to existing non-learning heuristics.