The result set from the query:
SELECT salesman, state, SUM(sales)
FROM company.sales
GROUP BY salesman, state;
provides an answer for the query:
SELECT salesman, SUM(sales)
FROM company.sales
GROUP BY salesman
The result from a query like:
SELECT date, salesman, SUM(sales)
FROM company.sales
GROUP BY date, salesman
WHERE date > '2003-01-01'
provides an answer for the following query:
SELECT date, salesman, SUM(sales)
FROM company.sales
GROUP BY date, salesman
WHERE date > '2003-01-01' AND date > '2003-06-01'
The query:
SELECT salesman, gender, SUM(sales)
FROM company.sales
INNER JOIN salesman USING (salesman_id)
GROUP BY salesman, gender
provides an answer for:
SELECT salesman, SUM(sales)
FROM company.sales
GROUP BY salesman
SELECT SUM(quantity), SUM(price)
FROM company.sales
GROUP BY salesman
provides an answer for:
SELECT SUM(quanity)
FROM company.sales
GROUP BY salesman
1. We take an initial query that we will cache:
SELECT salesman, state, COUNT(*)
FROM company.sales
GROUP BY salesman, state;
2. We execute the query, cache the result and return it to the user:
CREATE TABLE cache.1234567
SELECT salesman, COUNT(*)
FROM company.sales
GROUP BY salesman, state;
SELECT * FROM cache.1234567;
3. We get a follow-up query:
SELECT state, COUNT(*)
FROM sales
GROUP BY state;
4. We use the cache to answer the query:
SELECT state, SUM(`COUNT(*)`) AS `COUNT(*)`
FROM cache.1234567
GROUP BY state
COUNT(*) is a valid column name in cache.1234567.
1. For each cached query, we mantain:
2. A subsequent query results in a cache hit if:
This system will be possible under the following assumptions. They are easily satisfied if the database client behaves in a certain way and will also occur naturally in pivot tables and other data mininig applications. The net result is that the cache will be able to answer a broader range of queries, making it feasible.
SELECT COUNT(*)
FROM company.sales
WHERE date > CURDATE()
will be used to answer
SELECT COUNT(*)
FROM company.sales
WHERE date > CURDATE() AND date > NOW()
even though the latter contains a redundant expression, it will help in producing a cache hit.
SELECT SUM(price)
FROM company.sales
GROUP BY state
will be silently rewritten into:
SELECT SUM(price), MIN(price), MAX(price), AVG(price)
FROM company.sales
GROUP BY state
and all 4 columns will be cached. This comes from the assumption that the table will be scanned anyway to produce the SUM(), it does not hurt to calculate a few more values in the process.
Given a knowledge in the hierarchy of aggregations, a query:
SELECT SUM(price)
FROM company.sales
GROUP BY country
will be transformed into:
SELECT SUM(price)
FROM company.sales
GROUP BY country, state
The result set from the updated query will be able to answer both the original one, as well as any futher query resulting from a drill-down in the dimension of the original query.
Given enough information about many-to-one relationships, for the query
SELECT salesman_id, SUM(price)
FROM company.sales
GROUP BY salesman_id
the cache will execute and cache
SELECT salesman_id, salesmen.sex, salesmen.age, SUM(sales.price)
FROM company.salesmen
LEFT JOIN company.sales
USING (salesman_id)
GROUP BY salesman_id
Which will allow the following query to be answered from the cache:
SELECT age, COUNT(*)
FROM company.salesmen
GROUP BY age