In disaster response situations it is often necessary to use eVTOL/UAM vehicles to search for and identify humans when debris makes affected areas inaccessible on foot. We propose an auditory assist mechanism to detect the presence of human vocal sounds in the presence of background noise paired with visual recognition capabilities. In an initial phase, we will assess the ability of trained machine learning platforms to determine whether a human voice is present in a synthetic training set, consisting of ambient sound with or without voices added. The second phase will incorporate the visual spectrum and training a machine learning platform to use audio cues paired with automated video analysis. If successful, in the following phase we would seek to integrate this technology into a prototype vocal and computer vision subsystem for search-and-rescue applications. We would leverage both the audio and visual inputs of the scene to understand if an individual or group of individuals needs rescue.