The broader impact/commercial potential of this Small Business Technology Transfer (STTR) project will be to develop an analysis software package to significantly reduce health care costs while simultaneously improving patient care by helping select the correct treatment for each patient. Every year an estimated of 1.4 million women undergo unnecessary treatments at a cost to society of $32.2 billion for breast cancer. At the same time, some patients do not receive the treatment they need. For instance, chemotherapy is not routinely recommended after surgical tumor removal for patients with early stage lung cancer, even though the disease will recur in a large number of them. The ability to correctly identify disease subtypes and patient subgroups is a pre-condition to the ability to distinguish between patients that need the most aggressive treatments, and those who will never progress or recur. Further, the proposed approach can improve the results of clinical trials. With an estimated 2,300 Phase III clinical trials per year in the US, a full 50% of them are destined for failure with a loss of $1 billion/year. This can be avoided if the correct inclusion criteria are defined, and the drug is administered only to the people most likely to respond. This STTR Phase I project proposes to develop a novel software package able to identify subtypes of disease based on the integration of multiple types of omics data. Many drug candidates fail and many patients receive inappropriate treatment because of the current inability to distinguish between subgroups of patients and/or subtypes of disease. Many attempts to achieve this based solely on gene expression signatures have been undertaken but yielded only modest success so far. In addition, very few approaches are able to combine multiple data types and most of the time the analysis of each data type leads to different subgroups that are very hard to interpret. The technology proposed here will be able to discover clinically relevant disease subtypes by integrating multiple types of high-throughput data such as mRNAs, miRNAs, methylation, etc. The goal of this project is to implement this technology as a software package that will facilitate its application in large scale consistent with real-world use. In addition, the plan is to assess the feasibility of this technology by performing an extensive comparison with the top three existing approaches: Consensus clustering, similarity network fusion, and iClusterPlus using over 1,800 real patient data from 12 different studies.