SBIR-STTR Award

A Multithreaded Storage Engine using Highly-Concurrent Fractal Trees
Award last edited on: 12/28/2023

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$575,000
Award Phase
2
Solicitation Topic Code
IC
Principal Investigator
Bradley Kuszmaul

Company Information

Tokutek Inc

1 Militia Drive Suite 11
Lexington, MA 02421
   (212) 244-7600
   contact@tokutek.com
   www.tokutek.com
Location: Multiple
Congr. District: 05
County: Middlesex

Phase I

Contract Number: 0945687
Start Date: 1/1/2010    Completed: 6/30/2010
Phase I year
2009
Phase I Amount
$150,000
This Small Business Innovation Research Phase I project will investigate techniques for implementing high-performance databases on multi-core computers by focusing on how to support concurrent activity with provably good thread scheduling in "Fractal Tree" databases. Today's databases suffer from resource imbalances between storage bandwidth, disk-seek rate, and CPU core capacity, leading to underperformance, cumbersome workarounds, and energy inefficiency. The company has developed a high-performance storage engine for MySQL that maintains indexes on live data 100 times faster than traditional engines. The approach employs cache-oblivious Fractal-Tree indexes, which scale with storage bandwidth rather than seek rate, thus addressing the imbalance between bandwidth and disk-seek rate. If successful, this research will produce a database implementation that for each query that either saturates the CPU cores, saturates disk bandwidth, or consumes all of the inherent parallelism in the query. The target market comprises organizations that have very large databases and a workload dominated by insertions and queries. There are many application areas that do not employ databases because their performance is too slow. Orders-of-magnitude speedup for databases can help grow the market. Applications in finance, retail, homeland security, telecommunications, and scientific computing will benefit from high-performance databases. Furthermore the researchers hope to lead all database implementers into the multi-core realm. The proposed research will further the understanding of how to schedule database queries when data is well laid out on disk. As users' appetite for data continues to outstrip the availability of fast memory, organizing multithreaded queries on disk-based data for performance will only grow in importance

Phase II

Contract Number: 1058565
Start Date: 2/1/2011    Completed: 7/31/2013
Phase II year
2011
Phase II Amount
$425,000
This Small Business Innovation Research (SBIR) Phase II project will apply multithreading techniques to provide multi-terabyte (and larger) high-performance databases in MySQL. The company has developed a highperformance storage engine for MySQL, which maintains indexes on live data 100 times faster than current commonly-used structures. The technology solves the problem of maintaining indexes on large databases in the face of high trickle-load indexing rates. In Phase I, the company developed a multithreaded bulk loader to solve the problem of how to load data quickly. The next significant research problems for large MySQL databases are to allow online, or "hot", schema changes in which, for example, an index can be added without taking the database down, and to use multithreading to speed up joins and reductions so that the large data sets can be queried quickly. In this project, the researchers will investigate the use of multithreading to support hot indexing and parallel joins reductions. If successful, multi-terabyte and larger databases will be manageable and fast on modest hardware, and the hardware will be scalable both with CPU cores and disks. The broader impact of this work is driven by faster, cheaper, lower-power on-disk storage. Organizations that have very large databases will be able to use much less hardware, both saving money and reducing power consumption significantly. Currently many application areas do not employ databases because their performance is too slow. Speeding up databases by two orders-of-magnitude can help grow the market. Currently, many organizations fail to make good use of the data they have collected because they cannot manage it, index it, or query it fast enough to be useful. Applications in finance, retail, homeland security, telecommunications, and scientific computing will benefit from improved manageability and performance. As users' appetite for data continues to outstrip the availability of fast memory, organizing multithreaded queries on disk-based data for performance will continue to grow in importance.