Phishing attacks fool end-users into disclosing sensitive information (e.g., passwords, trade secrets, and national security secrets) or installing malware on their computers. While a number of automated solutions have been developed to mitigate these attacks, these solutions have drawbacks in terms of scalability, timeliness, and accuracy. Our goal is to develop novel anti-phishing email and web filtering techniques that overcome these limitations and dramatically improve the state of the art. Our work will combine two new technologies originally developed at Carnegie Mellon University, as well as novel linguistic analysis techniques. These technologies will make it possible to deploy email and web filtering solutions that exhibit considerably higher levels of accuracy than todays solutions in a manner that is highly scalable and effective against zero-hour attacks. We have three key objectives for this Phase I effort: (1) evaluating system architectures that integrate our existing anti-phishing technologies in a way that is highly accurate, timely, and scalable both for the web and for email; (2) developing and evaluating natural language processing techniques that detect both spear-phishing and reply-to emails; (3) refining go-to-market strategies for these filters, in terms of deployment options and business models.
Keywords: Phishing, Email Filter, Web Filter, Natural Language Processing, Machine Learning, Information Assurance, Computer Security