SBIR-STTR Award

Software to Automate the Detection of Websites that are Fraudulent or Otherwise Harmful to Consumers
Award last edited on: 12/28/2023

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$749,600
Award Phase
2
Solicitation Topic Code
IC
Principal Investigator
Michael Lai

Company Information

GGL Projects Inc

575 Market Street Suite 375
San Francisco, CA 94105
   (415) 894-5806
   jeremy@sitejabber.com
   www.sitejabber.com
Location: Single
Congr. District: 11
County: San Francisco

Phase I

Contract Number: 1014075
Start Date: 7/1/2010    Completed: 12/31/2010
Phase I year
2010
Phase I Amount
$149,600
This Small Business Innovation Research (SBIR) Phase I project will develop software to prevent the manipulation of consumer reviews of websites and online businesses. Consumer reviews are a vital component in educating consumers about the trustworthiness of websites. However, any platform that makes it easy for consumers to review websites also makes itself vulnerable to abuse from actors who write purposefully deceptive, self-promoting, or low-quality reviews. These manipulative reviews can mislead consumers and permanently damage the credibility of the platform on which the reviews are published. Preventing these reviews is difficult because although traditional spam filters are effective in filtering automated spam, they are unable to detect manipulative reviews written by humans. This project will assess the feasibility of building and training a customized content filter with additional heuristic algorithms incorporating community feedback, reviewer attributes, and supplemental third-party data, to effectively detect and remove both automated and human-generated manipulative reviews. The FBI received 275,000 complaints of online fraud in 2008. The Washington Post has estimated $100 billion is lost every year in online fraud. Sites that often present the biggest risk to consumers include health information providers, paid online service providers, small retailers, and sites based outside the US. To address this problem, the company will help consumers identify the best and worst websites quickly and easily via reviews written by members of the community. A typical use case might involve a consumer who is looking to make a purchase on an obscure and unfamiliar website. Using the solution, instead of taking a risk, the consumer could look up the website in question and benefit from the experiences of other consumers to learn important information such as: whether the website is involved in any known scams, if the depictions of goods or services is consistent with what is delivered, and whether there is a better website which provides similar goods or services. If successfully deployed, the solution described in this research effort will address a significant and growing problem related to e-commerce

Phase II

Contract Number: 1127567
Start Date: 9/1/2011    Completed: 2/28/2014
Phase II year
2011
(last award dollars: 2012)
Phase II Amount
$600,000

This Small Business Innovation Research (SBIR) Phase II project will develop software to automatically detect a broad spectrum of websites that are fraudulent or otherwise harmful to consumers. Much work has been done on specific software capable of detecting websites hosting malware or engaged in phishing. However, software does not yet exist which can detect a broader array of harmful websites, including those selling counterfeits, selling illegal drugs, and hosting weight-loss scams, to name just a few. The challenge in doing this involves selecting the right features of fraudulent sites which in isolation or combination are good indictors of a site's harmfulness. Using these features, a machine learning classifier can be trained using data on known harmful websites. Unknown websites can then be run through the classifier to evaluate their potential for harm. Additional challenges involve gathering sufficient data to properly train the classifier, making the classifier general enough to detect a range of harmful sites while still maintaining accuracy, and updating the classifier in real-time such that it can improve with ongoing human feedback and additional data. The principal impact of this project is the protection of consumers from online fraud. Today, consumers lack reliable resources to evaluate unfamiliar websites. Most use familiar sites like Amazon or take a gamble on Google search results. These gambles frequently result in fraud. It is believed that there are now over 250 million websites and $100 billion lost yearly to online fraud. While the statistics cover many types of fraud, examples of risky sites include online counterfeiters, pharmacies, and retailers. The software developed in this project will greatly improve transparency around websites and protect millions from fraud. The technical achievements in this project involve the use of a vector space model in converting non-discrete features of fraudulent sites into useful data that can be inputted into a machine learning classifier. Additionally, this technology will include innovative feature choices, access to high-quality data, and the creation of a general classifier capable of improving itself in real-time and detecting a broad array of heretofore undetectable fraudulent sites