Diving into Ecto Part 2

Trolling for Data

In Part 1 we setup our schema and created sample data for our FishingSpot app. Now we are finally ready to get our hands dirty querying that data. We have fishermen, fishing trips and a bunch of fish in our database. Let's start by exploring some of Ecto's data retrieval functions.

My repo has been updated with the code for this post and can be found here.

Let's start by getting our feet wet in IEx. It helps to alias Repo and all the models so we don't have to prefix everything with FishingSpot while in an IEx session. A really nice way to do this is to create a .iex.exs file in the root of the project. Add the following:

  alias FishingSpot.Repo
alias FishingSpot.Location
alias FishingSpot.LocationTrip
alias FishingSpot.LocationType
alias FishingSpot.Fisherman
alias FishingSpot.PersonTrip
alias FishingSpot.FishLanded
alias FishingSpot.FishSpecies
alias FishingSpot.FlyType
alias FishingSpot.Trip


Open up the Elixir repl with our FishingSpot app running with:

iex -S mix


The Ecti.Repo module has a number of functions we can use to query the database. One of the most used is all/2. Here is how it can be used to retrieve all fishermen:

fishermen = Repo.all(Fisherman)
Enum.each(fishermen, fn(fisherman) -> IO.puts fisherman.name end)
#=>Mark
#=>Kirk
#=>Joe
#=>Lew


We can grab the fishermen with an id of 1 with get/3:

fisherman = Repo.get(Fisherman, 1)
fisherman.name #=> "Mark"


Can we also get a specific fisherman by name?

lew = Repo.get(Fisherman, name: "Lew")
** (Ecto.CastError) deps/ecto/lib/ecto/repo/queryable.ex:188: value [name: "Lew"] in where cannot be cast to type :id in query:

from p in FishingSpot.Fisherman,
where: p.id == ^[name: "Lew"]

Error when casting value to FishingSpot.Fisherman.id
(ecto) lib/ecto/query/planner.ex:124: Ecto.Query.Planner.prepare/3
(ecto) lib/ecto/query/planner.ex:66: Ecto.Query.Planner.query/4
(ecto) lib/ecto/repo/queryable.ex:91: Ecto.Repo.Queryable.execute/5
(ecto) lib/ecto/repo/queryable.ex:15: Ecto.Repo.Queryable.all/4
(ecto) lib/ecto/repo/queryable.ex:44: Ecto.Repo.Queryable.one/4


Nope. Repo.get/3 only looks in the primary key column. We need to use Repo.get_by/3.

lew = Repo.get_by(Fisherman, name: "Lew")
lew.name


These functions will return nil if no record is found. Both of those functions have a ! version that raises if no record is found.

Repo.get!(Fisherman, -1)
** (Ecto.NoResultsError) expected at least one result but got none in query:

from p in FishingSpot.Fisherman,
where: p.id == ^-1

(ecto) lib/ecto/repo/queryable.ex:57: Ecto.Repo.Queryable.one!/4

Repo.get_by!(Fisherman, name: "Ahab")
** (Ecto.NoResultsError) expected at least one result but got none in query:

from p in FishingSpot.Fisherman,
where: p.name == ^"Ahab"

(ecto) lib/ecto/repo/queryable.ex:57: Ecto.Repo.Queryable.one!/4


Also, get_by/3 and get_by!/3 will raise Ecto.MultipleResultsError if the more than one record is returned.

Repo.get_by!(FishLanded, fisherman_id: 1)
** (Ecto.MultipleResultsError) expected at most one result but got 247 in query:

from f in FishingSpot.FishLanded,
where: f.fisherman_id == ^1

(ecto) lib/ecto/repo/queryable.ex:58: Ecto.Repo.Queryable.one!/4


We can see from the error messages that get/2 and get_by/3 are delegating to one/2 and one!/2. Let's try that last query ourselves.

Repo.one(from f in FishLanded,
where: f.fisherman_id == 1)

** (RuntimeError) undefined function: f/0


Oops! What's this? Elixir tried to call f as a function. That is because, as I'm prone to do, I forgot to import Ecto's query API. Let's do that now and make sure we don't forget in the future. Pop open the .iex.exs file we created earlier and add:

import Ecto.Query


When we fire up IEx and try that again we get the same error as the get_by/1 query above, Ecto.MultipleResultsError. Let's try another:

Repo.one(from f in Fisherman,
where: f.name == "Lew")

date_of_birth: #Ecto.Date<1976-01-05>,
id: 4, inserted_at: #Ecto.DateTime<2015-09-10T16:51:34Z>, name: "Lew",
updated_at: #Ecto.DateTime<2015-09-10T16:51:34Z>}


Take a look at what is returned from the database. We get a struct that has fields for all the data in the record like :name. Notice that there are placeholders for all the associations with this strange #Ecto.Association.NotLoaded<association :fishermen_trips is not loaded> struct. Ecto does not return data for associated models unless we tell it to explicitly. This struct is like a friendly reminder that we haven't loaded the association. Let's see what happens when we try to do something with this struct.

fisherman = Repo.get_by(Fisherman, name: "Lew")
fisherman.fish_landed |> Enum.first
** (UndefinedFunctionError) undefined function: Enum.first/1


As you can see this makes for a very nice error message. If the association field was simply nil you would get this cryptic message and be left scratching your head.

** (UndefinedFunctionError) undefined function: Enum.first/1
(elixir) Enum.first(nil)


We can preload the associated records with preload/2:

fisherman = Repo.get_by(Fisherman, name: "Lew")
Enum.count(fisherman.fish_landed)

249


Rigging Up

Great! Now that we have all of the Ecto.Repo query methods under our belt we are ready to tie on some more advanced queries.

Typing these queries directly into IEx gets a bit awkward so lets create a file in lib called queries.exs. This file will start out like this:

defmodule FishingSpot.Queries do
alias FishingSpot.Repo
alias FishingSpot.FishLanded

import Ecto.Query
end


We will want to load and reload this file into IEx. As we have already seen we can load a file into IEx with c/2. I like to make a shortcut for this since I run it so often (and mistype the filename 90% of the time). Again open the .iex.exs file and add:

q = fn -> c("lib/queries.exs") end


Now calling q.() in IEx will reload the queries file.

Fish Stories

Of course the main thing any fisherman cares about is the size of the fish they catch. Let's implement a biggest_fish function in our Queries module:

def biggest_fish do
Repo.one(
from fish in FishLanded,
select: max(fish.length)
)
end


Disclaimer: Most Ecto query examples use the first letter of the model as a reference like this, from f in FishLanded, select: max(f.length). I prefer to spell out what I am referencing. There are a few reason for this. I have an allergic reaction to single letter variable names. This is probably because I am easily confused and forget which thing is which in my queries. Also, you will notice that we have a number of models in our schema that start with the same letter. Using only the first letter of a model's name as a reference breaks down very quickly.

Here is what I get when I run that in IEx with FishingSpot.Queries.biggest_fish:

07:21:26.274 [debug] SELECT max(f0."length") FROM "fish_landed" AS f0 [] OK query=1.1ms
#Decimal<74>


Ecto used the aggregate function max in the select and returned a decimal value of 74. We used one/2 because we knew there would be a single result. We could also have used all/2 but instead of a single value, Ecto would have returned a list with the value inside, [#Decimal<74>]. We also get a list, even when using one/2 if we use a list for the select clause like this:

  Repo.one(
from fish in FishLanded,
select: [max(fish.length)]
)
15:54:30.482 [debug] SELECT max(f0."length") FROM "fish_landed" AS f0 [] OK query=1.6ms
[#Decimal<74>]


This distinction is helpful to remember as using one style versus another can change how much post-processing you need to do on the data returned.

Did I mention those values are in inches? (Apologies to those metric fishermen out there.)

What!?! A six foot fish? That seems a little fishy. Let's see how many fish we have over two feet. We can start with this:

def fishy_fish do
Repo.all(
from fish in FishLanded,
where: fish.length > 24
)
end


Running that splats out a bunch of records. Not very helpful. Let's count those fish:

def fishy_fish_count do
Repo.all(
from fish in FishLanded,
select: count(fish.id),
where: fish.length > 24
)
end
06:31:43.126 [debug] SELECT count(f0."id") FROM "fish_landed" AS f0 WHERE (f0."length" > 24) [] OK query=6.1ms
[211]


As you would assume, Ecto has support for all the basic aggregate functions, max, min, avg, sum, and count.

We should find out who caught these massive fish.

Repo.all(
from fish in FishLanded,
join: fisherman in assoc(fish, :fisherman),
group_by: fisherman.name,
select: [max(fish.length), fisherman.name]
)
06:38:29.265 [debug] SELECT max(f0."length"), f1."name" FROM "fish_landed" AS f0 INNER JOIN "fishermen" AS f1 ON f1."id" = f0."fisherman_id" GROUP BY f1."name" [] OK query=14.3ms
[[#Decimal<74>, "Mark"], [#Decimal<47>, "Kirk"], [#Decimal<30>, "Lew"],
[#Decimal<30>, "Joe"]]


(Now that we are warmed up, I'm only going to list the query and the output.)

This is our first join to another model. As you can see we simply tell Ecto what association to follow and it sets up an inner join to that table in the database. We then use the name of the fisherman to group the record. The records don't look like they are in any kind of order. Let's fix that up now.

Repo.all(
from fish in FishLanded,
join: fisherman in assoc(fish, :fisherman),
group_by: fisherman.name,
order_by: fisherman.name,
select: [max(fish.length), fisherman.name]
)

06:45:36.320 [debug] SELECT max(f0."length"), f1."name" FROM "fish_landed" AS f0 INNER JOIN "fishermen" AS f1 ON f1."id" = f0."fisherman_id" GROUP BY f1."name" ORDER BY f1."name" [] OK query=2.4ms

[[#Decimal<30>, "Joe"], [#Decimal<47>, "Kirk"], [#Decimal<30>, "Lew"],
[#Decimal<74>, "Mark"]]


Much better.

What if we want to look for only the fisherman who caught the biggest fish? (Pretend we can't just figure it out from the list above) We have a couple of options. Let's first try using the query we already have. Can we just put the result of the biggest_fish/0 function into a where clause?

Repo.all(
from fish in FishLanded,
join: fisherman in assoc(fish, :fisherman),
where: fish.length == biggest_fish,
select: [fish.length, fisherman.name]
)


BOOM!

== Compilation error on file lib/queries.exs ==
** (Ecto.Query.CompileError) variable biggest_fish is not a valid query expression. Variables need to be explicitly interpolated in queries with ^
(ecto) expanding macro: Ecto.Query.where/3
lib/queries.exs:50: FishingSpot.Queries.biggest_fish_per_fisherman_ordered_max/0
(ecto) expanding macro: Ecto.Query.select/3
lib/queries.exs:50: FishingSpot.Queries.biggest_fish_per_fisherman_ordered_max/0
(ecto) expanding macro: Ecto.Query.from/2
lib/queries.exs:50: FishingSpot.Queries.biggest_fish_per_fisherman_ordered_max/0


This doesn't compile but Elixir is all about helpful error messages. It tells us exactly what to do. We need to interpolate the result into the second query using the pin operator, ^. Update the where clause to:

where: fish.length == ^biggest_fish,


Now it compiles and we can try again.

16:05:52.495 [debug] SELECT max(f0."length") FROM "fish_landed" AS f0 [] OK query=1.2ms

16:05:52.497 [debug] SELECT f0."length", f1."name" FROM "fish_landed" AS f0 INNER JOIN "fishermen" AS f1 ON f1."id" = f0."fisherman_id" WHERE (f0."length" = \$1) [#Decimal<74>] OK query=1.4ms
[[#Decimal<74>, "Mark"]]


As you can see, the result of the first query is used in the second.

But wait, shouldn't we be able to do this in a single query? We can but we need to use some more advanced techniques. Let's try a method that looks very elegant but is always a little hard for me to wrap my brain around.

Repo.all(
from fish in FishLanded,
left_join: bigger_fish in FishLanded, on: fish.length < bigger_fish.length,
join: fisherman in assoc(fish, :fisherman),
where: is_nil(bigger_fish.id),
select: [fish.length, fisherman.name]
)

18:29:37.636 [debug] SELECT f0."length", f2."name" FROM "fish_landed" AS f0 LEFT OUTER JOIN "fish_landed" AS f1 ON f0."length" < f1."length" INNER JOIN "fishermen" AS f2 ON f2."id" = f0."fisherman_id" WHERE (f1."id" IS NULL) [] OK query=163.8ms
[[#Decimal<74>, "Mark"]]


It starts by joining FishLanded onto itself with a left join. The join should match the left side to the right if bigger_fish.length is greater than fish.length. Then we make sure that the left hand record is returned where there is no matching bigger_fish with where: is_nil(bigger_fish.id). The result is that only the row with the maximum length is returned because there are no other rows with greater lengths.

As is often the case, the solution's elegance comes with a price. Even with a small dataset this query can perform poorly. As you can see on my machine it took 163.8ms.

Here is yet another alternative.

Repo.all(
from fish in FishLanded,
join: fisherman in assoc(fish, :fisherman),
where: fragment(
"? IN (SELECT MAX(biggest_fish.length) FROM fish_landed biggest_fish)", fish.length
),
select: [fish.length, fisherman.name]
)

18:34:16.357 [debug] SELECT f0."length", f1."name" FROM "fish_landed" AS f0 INNER JOIN "fishermen" AS f1 ON f1."id" = f0."fisherman_id" WHERE (f0."length" IN (SELECT MAX(fish.length) FROM fish_landed fish)) [] OK query=1.8ms
[[#Decimal<74>, "Mark"]]


This time we use the Ecto fragment/1 to put a subquery in our where clause. We grab the maximum length and then use that to return the entire record matching the maximum length using an IN clause. 1.8ms is quite a bit better performance. I think we should settle on this one.

Repo.one(
from fish in FishLanded,
join: fly_type in assoc(fish, :fly_type),
join: fish_species in assoc(fish, :fish_species),
join: fisherman in assoc(fish, :fisherman),
join: trip in assoc(fisherman, :trips),
join: locations in assoc(trip, :locations),
join: location_types in assoc(locations, :location_type),
where: fragment(
"? IN (SELECT MAX(biggest_fish.length) FROM fish_landed biggest_fish)", fish.length
),
where: fish.date_and_time >= trip.start_date,
where: fish.date_and_time <= date_add(trip.end_date, 1, "day"),
select: %{
length: fish.length,
date_caught: fish.date_and_time,
fish_type: fish_species.name,
fly: fly_type.name,
fisherman: fisherman.name,
trip_start: trip.start_date,
trip_end: trip.end_date,
location: locations.name,
location_type: location_types.name
})

18:33:48.875 [debug] SELECT f0."length", f0."date_and_time", f2."name", f1."name", f3."name", t4."start_date", t4."end_date", l5."name", l6."name" FROM "fish_landed" AS f0 INNER JOIN "fly_types" AS f1 ON f1."id" = f0."fly_type_id" INNER JOIN "fish_species" AS f2 ON f2."id" = f0."fish_species_id" INNER JOIN "fishermen" AS f3 ON f3."id" = f0."fisherman_id" INNER JOIN "fishermen_trips" AS f7 ON f7."fisherman_id" = f3."id" INNER JOIN "trips" AS t4 ON t4."id" = f7."trip_id" INNER JOIN "locations_trips" AS l8 ON l8."trip_id" = t4."id" INNER JOIN "locations" AS l5 ON l5."id" = l8."location_id" INNER JOIN "location_types" AS l6 ON l6."id" = l5."location_type_id" WHERE (f0."length" IN (SELECT MAX(biggest_fish.length) FROM fish_landed biggest_fish)) AND (f0."date_and_time" >= t4."start_date") AND (f0."date_and_time" <= (t4."end_date"::date + interval '1 day')::date) [] OK query=634.4ms

%{
date_caught: #Ecto.DateTime<2014-08-15T06:36:29Z>,
fish_type: "Rainbow Trout",
fisherman: "Mark",
fly: "Copper John",
length: #Decimal<74>,
location: "Lake Fork",
location_type: "River",
trip_end: #Ecto.Date<2014-08-18>,
trip_start: #Ecto.Date<2014-08-08>
}


Whoa! This is a monster query. We've now joined in all the models and have every possible piece of information about the big fish. There are a couple of points to make about this query. First, I have switched to the select to map syntax so that I can indicate what values return where. With so many return values using a list gets unwieldy. Additionally, there is no way to associate the landing of a fish with the trip the fisherman was on except by comparing dates. I'm looking for the trip where the date the fish was caught falls between the start and end of the trip. There is no between in Ecto so we resort to two where clauses (we could also use and/2 to put the clauses together). One issue arises from comparing the trip dates to the date_and_time field which is a datetime. If the fish was caught at some time on the last day of the trip then it will not be less than or equal to the trip end date. You can see for yourself if you fire up psql that a date type is cast to a timestamp with the time at 00:00:00.

select now() < now()::date;
?column?
----------
f
(1 row)

select (now()::date)::timestamp;
now
---------------------
2015-09-28 00:00:00
(1 row)


To solve this issue we use Ecto's date_add/3 function to add a day to the end of the trip so that it effectively falls on midnight of that day.

Oops: When I first wrote the sample data generator in Part 1 I neglected to ensure that the dates used for fish landed were calculated using one of the fisherman's trips. Also, I was missing aliases in some of the models. If you are using my old code you will want to pull the fixes.

I'm going to call it a day on this one. In Part 3 we will look at some other advanced querying techniques as well as Ecto's preloading that allows us to work with a record's associated data. Thanks for joining me on this adventure.

P.S. I have used the "keyword" style for all the Ecto queries in this post. Ecto has an alternative "expression" style. Drew Olson has a great post comparing and combining the two styles. Composable Queries with Ecto

Acknowledgements

A huge thanks to Kim Barnes (@Kimberlydbarnes) for reviewing and editing this post.