NOAA has used historical documents such as ship logs and many other resources to collect weather data critical to modeling global and regional climate and weather conditions. To date, the optical character recognition (OCR) technology developed over the past three decades remains limited in the ability to recognize handwriting and reliably extract text in context. Machine Learning (ML) algorithms can help improve the processes. Given the importance of accuracy for weather data, we propose the development and testing of a custom OCR/text extraction application built using OpenCV and Tesseract. Both are open-source and operate within the open-source Python environment. PyTorch will be evaluated as the deep learning library to optimize the OpenCV and Tesseract integration and post processing. We submit that this integration will provide more flexible pre-processing without undue complexity and understanding, require less post-processing, and establish a framework to add automation to pre/post processing and tuning compared to previous efforts. Our objectives are to: 1. Demonstrate feasibility of OpenCV as an image pre-processing tool for document layout analysis 2. Demonstrate feasibility of Tesseract as text extraction tool 3. Demonstrate feasibility of using PyTorch as adaptive deep learning library for post-processing and information extraction 4. Validate performance