/*
* Licensed und
Q:
How to avoid r
Gmina Bukowina Tat
New Yorkers Agains
The present invent
Electric power pro
Efficient producti
Effect of dietary
The present inventQ:
PHP/MySQL - How to calculate percentiles
I am trying to calculate the 50th, 90th and 95th percentile of a very large and growing MySQL table. It contains over 16 million rows. I will have to perform this operation periodically and want to do it the most efficient way possible. The table stores measurements in millis for users that visited the site. The table looks something like this:
id time (millis) ...
1 2332 ...
2 3245 ...
3 3446 ...
4 3324 ...
5 3452 ...
6 5234 ...
I want to calculate percentile and get something like this:
50th percentile 90th percentile 95th percentile
2332 3324 3452
5234 3324 3452
Is there a better way to do it than the one I came up with below?
SELECT (
SELECT MIN(time) as min_time
FROM my_table
WHERE time < (SELECT time FROM my_table ORDER BY time DESC LIMIT 1)) AS min_time,
(SELECT MAX(time) as max_time
FROM my_table
WHERE time > (SELECT time FROM my_table ORDER BY time ASC LIMIT 1)) AS max_time
FROM my_table;
That query would work if there was an index on time column, but as you can imagine, it's too expensive and only the "last" time is available for calculating percentile.
A:
This might work, I assume the id is an int?
SELECT id, time,
@sum := @sum + t1.time AS sum_times,
@counter := @counter + 1 AS occurrence,
@rank := @rank + 1 AS rank,
FROM ( SELECT id, time,
IF(@counter = 0, @id := 0, @id := @id + 1) AS counter,
@rank := @rank + 1 AS rank,
@sum := @sum + time AS sum_times
FROM yourtable
ORDER BY time
LIMIT 1
) AS t1
JOIN ( SELECT @counter := 0, @id := 0, @rank := 0 ) AS t2
JOIN yourtable y
ON y.id = @id
ORDER BY counter DESC
LIMIT 1
A:
I think you can just
SELECT MIN(time) as min_time,
MAX(time) as max_time
FROM my_table
ORDER BY time ASC
LIMIT 1
If there are only one row in the query then the query will be run faster.
A:
You can find a good explanation and working code here.
This is how you can calculate percentile without knowing the number of rows in the table.
SELECT @rank := @rank + 1 AS rank,
@percent := (SELECT COUNT(1) / (
(SELECT COUNT(1) FROM table_name) *
(SELECT 1 + @percent / 100.0)
) AS percentile
FROM dual
CONNECT BY LEVEL <= (SELECT COUNT(1) FROM table_name));
SELECT @rank := 1,
@percent := @percent + (r1.time - r2.time) / (SELECT COUNT(1) FROM table_name) AS percentile
FROM (SELECT MIN(time) as min_time, MAX(time) as max_time FROM table_name) r1
JOIN (SELECT MIN(time) as min_time, MAX(time) as max_time FROM table_name) r2
ORDER BY min_time;
The above query will give you the value for percentile 50, 90 and 95. For more values you can insert the number of numbers in the table. You can do this in the below query
INSERT INTO
(SELECT @rank := 1,
@percent := @percent + (r1.time - r2.time) / (SELECT COUNT(1) FROM table_name) AS percentile
FROM (SELECT MIN(time) as min_time, MAX(time) as max_time FROM table_name) r1
JOIN (SELECT MIN(time) as min_time, MAX(time) as max_time FROM table_name) r2
ORDER BY min_time) AS t
VALUES
('50'),
('90'),
('95');
For this query you don't need the index. If you still want an index just add a new index on min(time) and/or max(time).
You can find some examples of other ways to calculate percentile without an index here.
A:
You can try this:
SELECT x.min_time, x.max_time, x.percentile
FROM (SELECT min(time) min_time, max(time) max_time, @rank := 0) AS x,
(SELECT @rank := @rank + 1) AS s,
(SELECT COUNT(*) *
(SUM(time) - @rank) /
COUNT(*) OVER () AS percentile) AS percentile_percentile;
SQLFiddle Demo