I recently had to rewrite a query that was running very slowly. It was taking over 30 seconds.

Upon analyzing it, I saw that it had 2 correlated sub-queries in it, and each was doing a count(*).

This meant that for each row in the main query, the two inner queries were running. In this case, the table the outer query was running in had just been increased from about 300 records to about 1500, so suddenly the query was running slowly. This kind of problem becomes more apparent as the number of rows in the outer query increases.

Here is the original query:

select automobiles.name, automobiles.model
,        (select count(*)
        from manufacturers
               inner join locatables
                 on manufacturers.id = locatables.locatables_id
        where automobiles.id = locatables.location_id
          and locatables.locatables_type = 'domestic'
          and manufacturers.deleted_at is null)
    as manufacturers_count,
       (select count(*)
        from locations
               inner join locatables
                 on locations.id = locatables.locatables_id
        where automobiles.id = locatables.location_id
          and locatables.locatables_type = 'domestic'
          and locations.deleted_at is null)
    as locations_count
from automobiles
where automobiles.deleted_at is null

And here is the refactored version:

select manufacturer_count.cnt as manufacturer_count, 
	   location_count.cnt as location_count, 
	   a.name,
	   a.model
    from automobiles a
    join(
    select l.location_id, count(l.locatables_id) as cnt
    from manufacturers m
    inner join locatables l
    on m.id = l.locatables_id
    and l.locatables_type = 'domestic'
    and m.deleted_at is null
    where location_id is not null
    group by l.location_id
    order by count(l.locatables_id) desc
    ) manufacturer_count on a.id = manufacturer_count.location_id
    join(
    select l.location_id, count(l.locatables_id) as cnt
    from locations c
    inner join locatables l
    on c.id = l.locatables_id
    and l.locatables_type = 'domestic'
    and c.deleted_at is null
    where location_id is not null
    group by l.location_id
    order by count(l.locatables_id) desc
    ) location_count on a.id = location_count.location_id

What I did was turn each of the sub-queries into a derived table, and then joined on those tables. To do this, I needed to have a column to match on in the join.

Because they were from my work, the queries I show here are similar, but not exactly the same as the real ones I used. I changed the table names and other values.

A great resource for learning about derived tables can be found at http://www.mysqltutorial.org/mysql-derived-table.

A faster alternative to correlated sub-queries