A Bit About Ecto Bindings

Names Come and Go but Position is Forever

Nearly all the macros in Ecto.Query take a binding as the second or third argument. In all the examples in the documentation these amount to a list of single variable letter names. These letters are then used in the following arguments to setup parts of a query.

I've always found bindings to be a bit mysterious. Why don't the names matter? How do they magically figure out what tables to match up to in a query?

The documentation at http://hexdocs.pm/ecto/Ecto.Query.html#from/2 has this to say:

def published(query) do
  from p in query, where: p.published_at != nil
end

Notice we have created a p variable to represent each item in the query. In case the given query has more than one from expression, each of them must be given in the order they were bound:

def published_multi(query) do
  from [p,o] in query,
  where: p.published_at != nil and o.published_at != nil
end

Note the variables p and o must be named however you find more convenient as they have no importance in the query sent to the database.

Ok. This tells us that the names are unimportant and that the order is important if the query has multiple froms. I find this a bit opaque.

Assuming "multiple from expressions" really translates to any operation that adds a table to a query then we have to build up a list of bindings for each model we join onto the initial from. Here is an example of a query with lots of joins:

FishLanded  
    |> join(:inner, [fish], fly_type in assoc(fish, :fly_type))
    |> join(:inner, [fish, fly_type], fish_species in assoc(fish, :fish_species))
    |> join(:inner, [fish, fly_type, fish_type], fisherman in assoc(fish, :fisherman))
    |> join(:inner, [fish, fly_type, fish_type, fisherman], trip in assoc(fisherman, :trips))
    |> join(:inner, [fish, fly_type, fish_type, fisherman, trip],
            locations in assoc(trip, :locations))
    |> join(:inner, [fish, fly_type, fish_type, fisherman, trip, location],
            location_types in assoc(location, :location_type))
    |> select([fish], count(fish.id))


#Ecto.Query<from f0 in FishingSpot.FishLanded, join: f1 in assoc(f0, :fly_type),
 join: f2 in assoc(f0, :fish_species), join: f3 in assoc(f0, :fisherman),
 join: t in assoc(f3, :trips), join: l0 in assoc(t, :locations),
 join: l1 in assoc(l0, :location_type), select: count(f0.id)>

SELECT count(f0."id") FROM "fish_landed" AS f0 INNER JOIN "fly_types" AS f1 ON f1."id" = f0."fly_type_id" INNER JOIN "fish_species" AS f2 ON f2."id" = f0."fish_species_id" INNER JOIN "fishermen" AS f3 ON f3."id" = f0."fisherman_id" INNER JOIN "fishermen_trips" AS f7 ON f7."fisherman_id" = f3."id" INNER JOIN "trips" AS t4 ON t4."id" = f7."trip_id" INNER JOIN "locations_trips" AS l8 ON l8."trip_id" = t4."id" INNER JOIN "locations" AS l5 ON l5."id" = l8."location_id" INNER JOIN "location_types" AS l6 ON l6."id" = l5."location_type_id"  

I have included the inspected version of the query as well as the actual SQL the query produces. As you can see, we have an ever growing list of bindings with each new join. But wait, the docs implied that only the position is important. In the first three joins of this query we are linking to the model that is in the first position, FishLanded. Do we need all the bindings all the time?

FishLanded  
      |> join(:inner, [fish], fly_type in assoc(fish, :fly_type))
      |> join(:inner, [fish], fish_species in assoc(fish, :fish_species))
      |> join(:inner, [fish], fisherman in assoc(fish, :fisherman))
      |> join(:inner, [fish, fly_type, fish_type, fisherman],
              trip in assoc(fisherman, :trips))
      |> join(:inner, [fish, fly_type, fish_type, fisherman, trip],
              locations in assoc(trip, :locations))
      |> join(:inner, [fish, fly_type, fish_type, fisherman, trip, location],
              location_types in assoc(location, :location_type))
      |> select([fish], count(fish.id))

#Ecto.Query<from f0 in FishingSpot.FishLanded, join: f1 in assoc(f0, :fly_type),
 join: f2 in assoc(f0, :fish_species), join: f3 in assoc(f0, :fisherman),
 join: t in assoc(f3, :trips), join: l0 in assoc(t, :locations),
 join: l1 in assoc(l0, :location_type), select: count(f0.id)>

SELECT count(f0."id") FROM "fish_landed" AS f0 INNER JOIN "fly_types" AS f1 ON f1."id" = f0."fly_type_id" INNER JOIN "fish_species" AS f2 ON f2."id" = f0."fish_species_id" INNER JOIN "fishermen" AS f3 ON f3."id" = f0."fisherman_id" INNER JOIN "fishermen_trips" AS f7 ON f7."fisherman_id" = f3."id" INNER JOIN "trips" AS t4 ON t4."id" = f7."trip_id" INNER JOIN "locations_trips" AS l8 ON l8."trip_id" = t4."id" INNER JOIN "locations" AS l5 ON l5."id" = l8."location_id" INNER JOIN "location_types" AS l6 ON l6."id" = l5."location_type_id"  

No! We can leave out bindings for positions we are not using in that particular macro call. What about names? Can they truly be arbitrary?

FishLanded  
      |> join(:inner, [fish], fly_type in assoc(fish, :fly_type))
      |> join(:inner, [nemo], fish_species in assoc(nemo, :fish_species))
      |> join(:inner, [bait], fisherman in assoc(bait, :fisherman))
      |> join(:inner, [foo, bar, baz, ahab], trip in assoc(ahab, :trips))
      |> join(:inner, [foo, bar, baz, ahab, set_sail], locations in assoc(set_sail, :locations))
      |> join(:inner, [x, y, z, a, b, c], location_types in assoc(c, :location_type))
      |> select([whatever], count(whatever.id))

#Ecto.Query<from f0 in FishingSpot.FishLanded, join: f1 in assoc(f0, :fly_type),
 join: f2 in assoc(f0, :fish_species), join: f3 in assoc(f0, :fisherman),
 join: t in assoc(f3, :trips), join: l0 in assoc(t, :locations),
 join: l1 in assoc(l0, :location_type), select: count(f0.id)>

SELECT count(f0."id") FROM "fish_landed" AS f0 INNER JOIN "fly_types" AS f1 ON f1."id" = f0."fly_type_id" INNER JOIN "fish_species" AS f2 ON f2."id" = f0."fish_species_id" INNER JOIN "fishermen" AS f3 ON f3."id" = f0."fisherman_id" INNER JOIN "fishermen_trips" AS f7 ON f7."fisherman_id" = f3."id" INNER JOIN "trips" AS t4 ON t4."id" = f7."trip_id" INNER JOIN "locations_trips" AS l8 ON l8."trip_id" = t4."id" INNER JOIN "locations" AS l5 ON l5."id" = l8."location_id" INNER JOIN "location_types" AS l6 ON l6."id" = l5."location_type_id"  

I strongly recommend you avoid writing queries this confusing. It illustrates the point that the binding names are only relevant within the macro call in which they are defined. There is no magic tie between names as the query is built. This allows for a great deal of flexibility, but can also be a source of confusion.

What if we accidentally add more bindings than there are sources in the query?

FishLanded  
            |> join(:inner, [fish, foo, bar, baz], fisherman in assoc(fish, :fisherman))
            |> join(:inner, [fish, fisherman],
                    trip in assoc(fisherman, :trips))
            |> select([fish], count(fish.id))

#Ecto.Query<from f0 in FishingSpot.FishLanded,
 join: f1 in assoc(f0, :fisherman), join: t in assoc(f1, :trips),
 select: count(f0.id)>

SELECT count(f0."id") FROM "fish_landed" AS f0 INNER JOIN "fishermen" AS f1 ON f1."id" = f0."fisherman_id" INNER JOIN "fishermen_trips" AS f3 ON f3."fisherman_id" = f1."id" INNER JOIN "trips" AS t2 ON t2."id" = f3."trip_id"  

Ecto simply ignores the extra bindings.

And if we forget one?

FishLanded  
        |> join(:inner, [fish, foo, bar, baz], fisherman in assoc(fish, :fisherman))
        |> join(:inner, [fish, fisherman],
                trip in assoc(fisherman, :trips))
        |> join(:inner, [fish, trip],
                locations in assoc(trip, :locations))
        |> select([fish], count(fish.id))

#Ecto.Query<from f0 in FishingSpot.FishLanded,
 join: f1 in assoc(f0, :fisherman), join: t in assoc(f1, :trips),
 join: l in assoc(f1, :locations), select: count(f0.id)>

** (Ecto.QueryError) lib/queries.exs:166: could not find association `locations` on model FishingSpot.Fisherman in query:

from f0 in FishingSpot.FishLanded,  
  join: f1 in assoc(f0, :fisherman),
  join: t in assoc(f1, :trips),
  join: l in assoc(f1, :locations),
  select: count(f0.id)

    (ecto) lib/ecto/repo/queryable.ex:91: Ecto.Repo.Queryable.execute/5
    (ecto) lib/ecto/repo/queryable.ex:15: Ecto.Repo.Queryable.all/4
           lib/queries.exs:170: FishingSpot.Queries.composed_biggest_fish_details/0

Things go badly, of course, because Ecto attempts to join on the wrong model.

A binding is a hook that Ecto can use within macro calls to determine which fields belong to which data source. It uses the variable names inside the macro call and the variable position between calls.

I encourage you to use your best judgement to write clean and clear Ecto queries. Personally I prefer the first version of the query above as it makes all the bindings and positions explicit all the way down.

I hope this takes some of the mystery out of bindings.

Acknowledgements

A huge thanks to Kim Barnes (@Kimberlydbarnes) and David Madouros (@dmadouros) for reviewing and updating the styling of this post.

Comments

The Ecto Query Library

This is basically the TL;DR for my Diving into Ecto series. I always hate having to skim a long blog post looking for a quick answer, and I know you do too. With that in mind I'm going to make this post a list of common and not so common queries you can do with Ecto. I will keep it up-to-date as I discover more interesting queries and ways of using Ecto.

The repo with this post reproduced in the README can be found at https://github.com/parkerl/ectoquerylibrary. The queries below can be found in a usable form in lib/fishing_spot/queries.ex. If you find something incorrect please open a pull request. I would like to make this a community resource rather than my personal toolbox.

For the full story behind this project and the queries start here Diving into Ecto: Part 1.

Versions

The query library is currently built using the following setup. Some features are only avaiable on Ecto master. I will strive to indicate which queries will only run on master.

  • Elixir 1.2
  • Ecto master
  • Postgres 9.4

Table of Contents


Simple Queries with Different Select Styles

Demonstrates how the various select styles change the return structure.

# Default: The entire model comes back as a list of structs.
Repo.all(  
  from fisherman in Fisherman
)

06:11:18.292 [debug] SELECT f0."id", f0."inserted_at", f0."updated_at", f0."name", f0."date_of_birth" FROM "fishermen" AS f0 [] OK query=0.5ms  
[%FishingSpot.Fisherman{meta: #Ecto.Schema.Metadata<:loaded>,
  date_of_birth: #Ecto.Date<1970-01-02>,
  fish_landed: #Ecto.Association.NotLoaded<association :fish_landed is not loaded>,
  fishermen_trips: #Ecto.Association.NotLoaded<association :fishermen_trips is not loaded>,
  id: 1, inserted_at: #Ecto.DateTime<2015-09-29T12:05:05Z>, name: "Mark",
  trips: #Ecto.Association.NotLoaded<association :trips is not loaded>,
  updated_at: #Ecto.DateTime<2015-09-29T12:05:05Z>},

# The same as above. The entire model comes back as a list of structs.
Repo.all(  
  from fisherman in Fisherman,
  select: fisherman
)

06:11:18.292 [debug] SELECT f0."id", f0."inserted_at", f0."updated_at", f0."name", f0."date_of_birth" FROM "fishermen" AS f0 [] OK query=0.5ms  
[%FishingSpot.Fisherman{meta: #Ecto.Schema.Metadata<:loaded>,
  date_of_birth: #Ecto.Date<1970-01-02>,
  fish_landed: #Ecto.Association.NotLoaded<association :fish_landed is not loaded>,
  fishermen_trips: #Ecto.Association.NotLoaded<association :fishermen_trips is not loaded>,
  id: 1, inserted_at: #Ecto.DateTime<2015-09-29T12:05:05Z>, name: "Mark",
  trips: #Ecto.Association.NotLoaded<association :trips is not loaded>,
  updated_at: #Ecto.DateTime<2015-09-29T12:05:05Z>},

# Selects only the given fields. Returns a list of lists. 
Repo.all(  
  from fisherman in Fisherman,
  select: [fisherman.name, fisherman.date_of_birth]
)

06:11:18.305 [debug] SELECT f0."name", f0."date_of_birth" FROM "fishermen" AS f0 [] OK query=0.4ms  
[["Mark", #Ecto.Date<1970-01-02>], ["Kirk", #Ecto.Date<1978-03-05>],
 ["Joe", #Ecto.Date<1973-10-15>], ["Lew", #Ecto.Date<1976-01-05>]]

# Selects only the given fields. Returns a list of tuples. 
Repo.all(  
  from fisherman in Fisherman,
  select: { fisherman.name, fisherman.date_of_birth }
)

06:11:18.306 [debug] SELECT f0."name", f0."date_of_birth" FROM "fishermen" AS f0 [] OK query=0.4ms  
[{"Mark", #Ecto.Date<1970-01-02>}, {"Kirk", #Ecto.Date<1978-03-05>},
 {"Joe", #Ecto.Date<1973-10-15>}, {"Lew", #Ecto.Date<1976-01-05>}]

# Selects only the given fields. Returns a list of maps with data in the given keys. 
Repo.all(  
  from fisherman in Fisherman,
  select: %{ fisherman_name: fisherman.name, fisherman_dob: fisherman.date_of_birth }
)

06:11:18.307 [debug] SELECT f0."name", f0."date_of_birth" FROM "fishermen" AS f0 [] OK query=0.3ms  
[%{fisherman_dob: #Ecto.Date<1970-01-02>, fisherman_name: "Mark"},
 %{fisherman_dob: #Ecto.Date<1978-03-05>, fisherman_name: "Kirk"},
 %{fisherman_dob: #Ecto.Date<1973-10-15>, fisherman_name: "Joe"},
 %{fisherman_dob: #Ecto.Date<1976-01-05>, fisherman_name: "Lew"}]

Select Distinct

from fish in FishLanded,  
select: fish.weight,  
distinct: true

12:18:21.346 [debug] SELECT DISTINCT f0."weight" FROM "fish_landed" AS f0 [] OK query=9.5ms decode=3.0ms  

Distinct On Expression

from fish in FishLanded,  
distinct: fish.weight

2:26:59.260 [debug] SELECT DISTINCT ON (f0."weight") f0."id", f0."inserted_at", f0."updated_at", f0."date_and_time", f0."weight", f0."length", f0."fisherman_id", f0."location_id", f0."fly_type_id", f0."fish_species_id" FROM "fish_landed" AS f0 [] OK query=3.3ms decode=0.1ms queue=0.1ms

[%FishingSpot.FishLanded{__meta__: #Ecto.Schema.Metadata<:loaded>,
    date_and_time: #Ecto.DateTime<2012-06-12 21:51:26>,
    fish_species: #Ecto.Association.NotLoaded<association :fish_species is not loaded>,
    fish_species_id: 1,
    fisherman: #Ecto.Association.NotLoaded<association :fisherman is not loaded>,
    fisherman_id: 1,
    fly_type: #Ecto.Association.NotLoaded<association :fly_type is not loaded>,
    fly_type_id: 3, id: 886, inserted_at: #Ecto.DateTime<2016-02-18 04:51:26>,
    length: #Decimal<6>,
    location: #Ecto.Association.NotLoaded<association :location is not loaded>,
    location_id: 1, updated_at: #Ecto.DateTime<2016-02-18 04:51:26>,
    weight: #Decimal<1>},...

Distinct with Order By

from fish in FishLanded,  
distinct: fish.weight,  
order_by: fish.date_and_time


12:37:15.371 [debug] SELECT DISTINCT ON (f0."weight") f0."id", f0."inserted_at", f0."updated_at", f0."date_and_time", f0."weight", f0."length", f0."fisherman_id", f0."location_id", f0."fly_type_id", f0."fish_species_id" FROM "fish_landed" AS f0  
ORDER BY f0."weight", f0."date_and_time" [] OK query=4.9ms decode=0.2ms

[%FishingSpot.FishLanded{__meta__: #Ecto.Schema.Metadata<:loaded>,
    date_and_time: #Ecto.DateTime<2012-06-06 21:51:25>,
    fish_species: #Ecto.Association.NotLoaded<association :fish_species is not loaded>,
    fish_species_id: 2,
    fisherman: #Ecto.Association.NotLoaded<association :fisherman is not loaded>,
    fisherman_id: 3,
    fly_type: #Ecto.Association.NotLoaded<association :fly_type is not loaded>,
    fly_type_id: 2, id: 571, inserted_at: #Ecto.DateTime<2016-02-18 04:51:25>,
    length: #Decimal<1>,
    location: #Ecto.Association.NotLoaded<association :location is not loaded>,
    location_id: 2, updated_at: #Ecto.DateTime<2016-02-18 04:51:25>,
    weight: #Decimal<1>}, ...

Max

from fish in FishLanded,  
select: max(fish.length)  

Simple Where

from fish in FishLanded,  
where: fish.length > 24  

Count

from fish in FishLanded,  
select: count(fish.id),  
where: fish.length > 24  

Count Distinct

from fish in FishLanded,  
select: count(fish.fisherman_id, :distinct)  

Group By with Max

from fish in FishLanded,  
join: fisherman in assoc(fish, :fisherman),  
group_by: fisherman.name,  
select: [max(fish.length), fisherman.name]  

Group By with Having

from fish in FishLanded,  
join: fisherman in assoc(fish, :fisherman),  
group_by: fisherman.name,  
having: count(fish.id) > 15,  
select: [max(fish.length), fisherman.name]  

Limit

from fish in FishLanded,  
limit: 10  

Limit and Offset

Demonstrates the use of limit and offset using a calculated value. Also, demonstrates the use of Repo.aggregate/3.

fish_count = Repo.aggregate(FishLanded, :count, :id) |> div(2)

Repo.all(  
 from fish in FishLanded,
 limit: 10,
 offset: ^fish_count
)

Order By

Demonstrates ordering ascending and descending including keyword syntax.

from fisherman in Fisherman,  
order_by: fisherman.name,  
select: fisherman.name

21:50:02.022 [debug] SELECT f0."name" FROM "fishermen" AS f0 ORDER BY f0."name" [] OK query=4.0ms  
["Joe", "Kirk", "Lew", "Mark"]

from fisherman in Fisherman,  
order_by: [desc: fisherman.name],  
select: fisherman.name

21:50:02.025 [debug] SELECT f0."name" FROM "fishermen" AS f0 ORDER BY f0."name" DESC [] OK query=0.5ms  
["Mark", "Lew", "Kirk", "Joe"]

from fisherman in Fisherman,  
order_by: :name,  
select: fisherman.name

21:50:02.022 [debug] SELECT f0."name" FROM "fishermen" AS f0 ORDER BY f0."name" [] OK query=4.0ms  
["Joe", "Kirk", "Lew", "Mark"]

from fisherman in Fisherman,  
order_by: [desc: :name],  
select: fisherman.name

21:50:02.025 [debug] SELECT f0."name" FROM "fishermen" AS f0 ORDER BY f0."name" DESC [] OK query=0.5ms  
["Mark", "Lew", "Kirk", "Joe"]

fields = [:name, :date_of_birth]  
13:40:42.863 [debug] SELECT f0."name" FROM "fishermen" AS f0 ORDER BY f0."name", f0."date_of_birth" [] OK query=0.2ms  
from fisherman in Fisherman,  
order_by: ^fields,  
select: fisherman.name

13:40:42.863 [debug] SELECT f0."name" FROM "fishermen" AS f0 ORDER BY f0."name", f0."date_of_birth" [] OK query=0.2ms  

Record with Max Value in Two Steps

Demonstrates interpolating the result of one query into another.

[big_fish] = Repo.all(
  from fish in FishLanded,
  select: max(fish.length)
 )

 Repo.all(
   from fish in FishLanded,
   join: fisherman in assoc(fish, :fisherman),
   where: fish.length == ^big_fish, 
   select: [fish.length, fisherman.name]
 )

Record with Max Value via Self Join

Demonstrates left joins, self joins, and conditions in joins. Calculates the
record with a maximum value by "folding" onto the same table.

from fish in FishLanded,  
left_join: bigger_fish in FishLanded, on: fish.length < bigger_fish.length,  
join: fisherman in assoc(fish, :fisherman),  
where: is_nil(bigger_fish.id),  
select: [fish.length, fisherman.name]  

Record with Max Value via Subquery

Demonstrates subqueries in where clauses.

from fish in FishLanded,  
join: fisherman in assoc(fish, :fisherman),  
where: fragment(  
    "? IN (SELECT MAX(biggest_fish.length) FROM fish_landed biggest_fish)", fish.length
  ),
select: [fish.length, fisherman.name]  

Record with Max Value via Join Fragment

Demonstrates the use of fragment in joins.

from fish in FishLanded,  
join: fisherman in assoc(fish, :fisherman),  
join: big_fish in fragment(  
    "(SELECT MAX(biggest_fish.length) AS length FROM fish_landed biggest_fish)"
  ),
on: fish.length == big_fish.length,  
select: [fish.length, fisherman.name]  

Keyword Where

Demonstrates the use of a keyword list for generating where clauses. Values are ANDd. Also, shows that variables will be interpolated.

    {_, date} = Ecto.Date.cast("1976-01-05")

    Repo.all(
      from fisherman in Fisherman,
      where: [name: "Lew", date_of_birth: ^date]
    )

    => SELECT f0."id", f0."inserted_at", f0."updated_at", f0."name", f0."date_of_birth" 
    FROM "fishermen" AS f0 
    WHERE ((f0."name" = 'Lew') 
      AND (f0."date_of_birth" = $1)) 
      [{1976, 1, 5}]

    where(Fisherman, [name: "Lew", date_of_birth: ^date]) |> Repo.all

    => SELECT f0."id", f0."inserted_at", f0."updated_at", f0."name", f0."date_of_birth" 
    FROM "fishermen" AS f0 
    WHERE ((f0."name" = 'Lew') 
      AND (f0."date_of_birth" = $1)) 
      [{1976, 1, 5}] 

Keyword Where Referencing Another Model

Demonstrates referencing another model in a keyword where clause. Also shows that no join condition is required by join. It defaults to ON TRUE.

join(Fisherman, :inner, [], fish_landed in FishLanded)  
  |> where([fisherman, fish_landed], [name: "Lew", date_of_birth: ^date, id: fish_landed.fisherman_id])
  |> Repo.all

=> SELECT f0."id", f0."inserted_at", f0."updated_at", f0."name", f0."date_of_birth" 
FROM "fishermen" AS f0  
INNER JOIN "fish_landed" AS f1 ON TRUE  
WHERE (((f0."name" = 'Lew')  
 AND (f0."date_of_birth" = $1)) 
 AND (f0."id" = f1."fisherman_id")) 

[{1976, 1, 5}] 

Where with In Clause

from fish in FishLanded,  
join: fisherman in assoc(fish, :fisherman),  
where: fisherman.name in ["Mark", "Kirk"],  
group_by: fisherman.name,  
order_by: fisherman.name,  
select: %{biggest_fish: max(fish.length), fisherman: fisherman.name}  

Where with Not In Clause

Demonstrates the use of not to negate an in clause.

from fish in FishLanded,  
join: fisherman in assoc(fish, :fisherman),  
where: not fisherman.name in ["Mark", "Kirk"],  
group_by: fisherman.name,  
order_by: fisherman.name,  
select: %{biggest_fish: max(fish.length), fisherman: fisherman.name}  

Complex Muti-join Multi-where

Demonstrates joins, sub-querying and using map syntax in the select.
Uses the date_add/3 function. Demonstrates how to accomplish a "between" where clause.

from fish in FishLanded,  
  join: fly_type in assoc(fish, :fly_type),
  join: fish_species in assoc(fish, :fish_species),
  join: fisherman in assoc(fish, :fisherman),
  join: trip in assoc(fisherman, :trips),
  join: locations in assoc(trip, :locations),
  join: location_types in assoc(locations, :location_type),
  where: fragment(
    "? IN (SELECT MAX(biggest_fish.length) FROM fish_landed biggest_fish)", fish.length
  ),
  where: fish.date_and_time >= trip.start_date,
  where: fish.date_and_time <= date_add(trip.end_date, 1, "day"),
  select: %{
    length: fish.length,
    date_caught: fish.date_and_time,
    fish_type: fish_species.name,
    fly: fly_type.name,
    fisherman: fisherman.name,
    trip_start: trip.start_date,
    trip_end: trip.end_date,
    location: locations.name,
    location_type: location_types.name
  }

Using a Select Fragment

with Named Grouping and Positional Ordering

Demonstrates how to use a named column from a fragment or a positional
column from an aggregate function in grouping or ordering.

from fish in FishLanded,  
group_by: fragment("date"),  
order_by: fragment("2"),  
select: %{  
  date: fragment("date_trunc('day', ?) AS date", field(fish, :date_and_time)),
  fish_count: count(fish.id)
}

Complex Select Logic Using a Fragment

Demonstrates how to use multiple columns to calculate a value.
Also orders descending using positional column selection.

from fish in FishLanded,  
join: fisherman in assoc(fish, :fisherman),  
order_by: [desc: fragment("1")],  
select: %{  
  bool: fragment(
    "((? = 'Kirk' OR ? = 'Mark') AND NOT ? < 10) AS crazy_select",
    field(fisherman, :name),
    field(fisherman, :name),
    field(fish, :length)),
  fisherman: fisherman.name
}

Partial-preloading

Demonstrates how to select only parts of a join model in a preload.
Uses both map and list select syntax.

    query = Fisherman
        |> join(:inner, [fisherman], fish in assoc(fisherman, :fish_landed))
        |> where([fisherman], fisherman.id == 1)
        |> select([fisherman, fish], %{fisherman: fisherman, length: fish.length})
        |> preload([fisherman, fish], [fish_landed: fish])

    Repo.first(query).fisherman.fish_landed |> IO.inspect

    query = Fisherman
        |> join(:inner, [fisherman], fish in assoc(fisherman, :fish_landed))
        |> where([fisherman], fisherman.id == 1)
        |> select([fisherman, fish], %{fisherman: fisherman, length: fish.length})
        |> preload([fisherman, fish], [fish_landed: fish])

    fisherman = Repo.first(query) |> List.first
    fisherman.fish_landed |> IO.inspect

Binding Fun

Demonstrates that bindings are order and not name dependent. See http://www.glydergun.com/a-bit-about-bindings/.

FishLanded  
      |> join(:inner, [fish], fly_type in assoc(fish, :fly_type))
      |> join(:inner, [fish, fly_type], fish_species in assoc(fish, :fish_species))
      |> join(:inner, [fish, fly_type, fish_type], fisherman in assoc(fish, :fisherman))
      |> join(:inner, [fish, fly_type, fish_type, fisherman], trip in assoc(fisherman, :trips))
      |> join(:inner, [fish, fly_type, fish_type, fisherman, trip],
              locations in assoc(trip, :locations))
      |> join(:inner, [fish, fly_type, fish_type, fisherman, trip, location],
              location_types in assoc(location, :location_type))
      |> select([fish], count(fish.id))

FishLanded  
      |> join(:inner, [fish], fly_type in assoc(fish, :fly_type))
      |> join(:inner, [fish], fish_species in assoc(fish, :fish_species))
      |> join(:inner, [fish], fisherman in assoc(fish, :fisherman))
      |> join(:inner, [fish, fly_type, fish_type, fisherman],
              trip in assoc(fisherman, :trips))
      |> join(:inner, [fish, fly_type, fish_type, fisherman, trip],
              locations in assoc(trip, :locations))
      |> join(:inner, [fish, fly_type, fish_type, fisherman, trip, location],
              location_types in assoc(location, :location_type))
      |> select([fish], count(fish.id))

FishLanded  
      |> join(:inner, [fish], fly_type in assoc(fish, :fly_type))
      |> join(:inner, [nemo], fish_species in assoc(nemo, :fish_species))
      |> join(:inner, [bait], fisherman in assoc(bait, :fisherman))
      |> join(:inner, [foo, bar, baz, ahab], trip in assoc(ahab, :trips))
      |> join(:inner, [foo, bar, baz, ahab, set_sail], locations in assoc(set_sail, :locations))
      |> join(:inner, [x, y, z, a, b, c], location_types in assoc(c, :location_type))
      |> select([whatever], count(whatever.id))

Working with Prefixes

Demonstrates how to work with schemas other than "public" in Postgres.

# The migration
  def change do
    execute "CREATE SCHEMA users"

    create table(:accounts, prefix: :users) do
      add :identifier, :string
      add :name,       :string

      timestamps
    end
  end

# Inserting data
    Repo.insert(
      Ecto.Model.put_meta( 
      %Account{ identifier: "lew@example.com",  name: "Lew"  }, 
      prefix: "users"
      )
    )
    Repo.insert(
      Ecto.Model.put_meta(
       %Account{ identifier: "mark@example.com", name: "Mark" }, 
       prefix: "users" 
      )
    )
    Repo.insert(
     Ecto.Model.put_meta(
       %Account{ identifier: "john@example.com", name: "John" }, 
       prefix: "users"
     )
    )

#Querying
    query = from accounts in Account
    Repo.all(%{query | prefix: "users"})

Comments

Diving into Ecto Part 2

Trolling for Data

In Part 1 we setup our schema and created sample data for our FishingSpot app. Now we are finally ready to get our hands dirty querying that data. We have fishermen, fishing trips and a bunch of fish in our database. Let's start by exploring some of Ecto's data retrieval functions.

My repo has been updated with the code for this post and can be found here.

Let's start by getting our feet wet in IEx. It helps to alias Repo and all the models so we don't have to prefix everything with FishingSpot while in an IEx session. A really nice way to do this is to create a .iex.exs file in the root of the project. Add the following:

  alias FishingSpot.Repo
  alias FishingSpot.Location
  alias FishingSpot.LocationTrip
  alias FishingSpot.LocationType
  alias FishingSpot.Fisherman
  alias FishingSpot.PersonTrip
  alias FishingSpot.FishLanded
  alias FishingSpot.FishSpecies
  alias FishingSpot.FlyType
  alias FishingSpot.Trip

Open up the Elixir repl with our FishingSpot app running with:

iex -S mix  

The Ecti.Repo module has a number of functions we can use to query the database. One of the most used is all/2. Here is how it can be used to retrieve all fishermen:

fishermen = Repo.all(Fisherman)  
Enum.each(fishermen, fn(fisherman) -> IO.puts fisherman.name end)  
#=>Mark
#=>Kirk
#=>Joe
#=>Lew

We can grab the fishermen with an id of 1 with get/3:

fisherman = Repo.get(Fisherman, 1)  
fisherman.name #=> "Mark"  

Can we also get a specific fisherman by name?

lew = Repo.get(Fisherman, name: "Lew")  
** (Ecto.CastError) deps/ecto/lib/ecto/repo/queryable.ex:188: value `[name: "Lew"]` in `where` cannot be cast to type :id in query:

from p in FishingSpot.Fisherman,  
  where: p.id == ^[name: "Lew"]

Error when casting value to `FishingSpot.Fisherman.id`  
    (ecto) lib/ecto/query/planner.ex:124: Ecto.Query.Planner.prepare/3
    (ecto) lib/ecto/query/planner.ex:66: Ecto.Query.Planner.query/4
    (ecto) lib/ecto/repo/queryable.ex:91: Ecto.Repo.Queryable.execute/5
    (ecto) lib/ecto/repo/queryable.ex:15: Ecto.Repo.Queryable.all/4
    (ecto) lib/ecto/repo/queryable.ex:44: Ecto.Repo.Queryable.one/4

Nope. Repo.get/3 only looks in the primary key column. We need to use Repo.get_by/3.

lew = Repo.get_by(Fisherman, name: "Lew")  
lew.name  

These functions will return nil if no record is found. Both of those functions have a ! version that raises if no record is found.

Repo.get!(Fisherman, -1)  
** (Ecto.NoResultsError) expected at least one result but got none in query:

from p in FishingSpot.Fisherman,  
  where: p.id == ^-1

    (ecto) lib/ecto/repo/queryable.ex:57: Ecto.Repo.Queryable.one!/4
Repo.get_by!(Fisherman, name: "Ahab")  
** (Ecto.NoResultsError) expected at least one result but got none in query:

from p in FishingSpot.Fisherman,  
  where: p.name == ^"Ahab"

    (ecto) lib/ecto/repo/queryable.ex:57: Ecto.Repo.Queryable.one!/4

Also, get_by/3 and get_by!/3 will raise Ecto.MultipleResultsError if the more than one record is returned.

Repo.get_by!(FishLanded, fisherman_id: 1)  
** (Ecto.MultipleResultsError) expected at most one result but got 247 in query:

from f in FishingSpot.FishLanded,  
  where: f.fisherman_id == ^1

    (ecto) lib/ecto/repo/queryable.ex:58: Ecto.Repo.Queryable.one!/4

We can see from the error messages that get/2 and get_by/3 are delegating to one/2 and one!/2. Let's try that last query ourselves.

Repo.one(from f in FishLanded,  
          where: f.fisherman_id == 1)

** (RuntimeError) undefined function: f/0

Oops! What's this? Elixir tried to call f as a function. That is because, as I'm prone to do, I forgot to import Ecto's query API. Let's do that now and make sure we don't forget in the future. Pop open the .iex.exs file we created earlier and add:

import Ecto.Query  

When we fire up IEx and try that again we get the same error as the get_by/1 query above, Ecto.MultipleResultsError. Let's try another:

Repo.one(from f in Fisherman,  
         where: f.name == "Lew")

%FishingSpot.Fisherman{__meta__: #Ecto.Schema.Metadata<:loaded>,
 date_of_birth: #Ecto.Date<1976-01-05>,
 fish_landed: #Ecto.Association.NotLoaded<association :fish_landed is not loaded>,
 fishermen_trips: #Ecto.Association.NotLoaded<association :fishermen_trips is not loaded>,
 id: 4, inserted_at: #Ecto.DateTime<2015-09-10T16:51:34Z>, name: "Lew",
 trips: #Ecto.Association.NotLoaded<association :trips is not loaded>,
 updated_at: #Ecto.DateTime<2015-09-10T16:51:34Z>}

Take a look at what is returned from the database. We get a struct that has fields for all the data in the record like :name. Notice that there are placeholders for all the associations with this strange #Ecto.Association.NotLoaded<association :fishermen_trips is not loaded> struct. Ecto does not return data for associated models unless we tell it to explicitly. This struct is like a friendly reminder that we haven't loaded the association. Let's see what happens when we try to do something with this struct.

fisherman = Repo.get_by(Fisherman, name: "Lew")  
fisherman.fish_landed |> Enum.first  
** (UndefinedFunctionError) undefined function: Enum.first/1
    (elixir) Enum.first(#Ecto.Association.NotLoaded<association :fish_landed is not loaded>)

As you can see this makes for a very nice error message. If the association field was simply nil you would get this cryptic message and be left scratching your head.

** (UndefinedFunctionError) undefined function: Enum.first/1
    (elixir) Enum.first(nil)

We can preload the associated records with preload/2:

fisherman = Repo.get_by(Fisherman, name: "Lew")  
fisherman = Repo.preload fisherman, :fish_landed  
Enum.count(fisherman.fish_landed)

249  

Rigging Up

Great! Now that we have all of the Ecto.Repo query methods under our belt we are ready to tie on some more advanced queries.

Typing these queries directly into IEx gets a bit awkward so lets create a file in lib called queries.exs. This file will start out like this:

defmodule FishingSpot.Queries do  
  alias FishingSpot.Repo
  alias FishingSpot.FishLanded

  import Ecto.Query
end  

We will want to load and reload this file into IEx. As we have already seen we can load a file into IEx with c/2. I like to make a shortcut for this since I run it so often (and mistype the filename 90% of the time). Again open the .iex.exs file and add:

q = fn -> c("lib/queries.exs") end  

Now calling q.() in IEx will reload the queries file.


Fish Stories

Of course the main thing any fisherman cares about is the size of the fish they catch. Let's implement a biggest_fish function in our Queries module:

def biggest_fish do  
  Repo.one(
      from fish in FishLanded,
      select: max(fish.length)
    )
end  

Disclaimer: Most Ecto query examples use the first letter of the model as a reference like this, from f in FishLanded, select: max(f.length). I prefer to spell out what I am referencing. There are a few reason for this. I have an allergic reaction to single letter variable names. This is probably because I am easily confused and forget which thing is which in my queries. Also, you will notice that we have a number of models in our schema that start with the same letter. Using only the first letter of a model's name as a reference breaks down very quickly.

Here is what I get when I run that in IEx with FishingSpot.Queries.biggest_fish:

07:21:26.274 [debug] SELECT max(f0."length") FROM "fish_landed" AS f0 [] OK query=1.1ms  
#Decimal<74>

Ecto used the aggregate function max in the select and returned a decimal value of 74. We used one/2 because we knew there would be a single result. We could also have used all/2 but instead of a single value, Ecto would have returned a list with the value inside, [#Decimal<74>]. We also get a list, even when using one/2 if we use a list for the select clause like this:

  Repo.one(
      from fish in FishLanded,
      select: [max(fish.length)]
    )
15:54:30.482 [debug] SELECT max(f0."length") FROM "fish_landed" AS f0 [] OK query=1.6ms  
[#Decimal<74>]

This distinction is helpful to remember as using one style versus another can change how much post-processing you need to do on the data returned.

Did I mention those values are in inches? (Apologies to those metric fishermen out there.)

What!?! A six foot fish? That seems a little fishy. Let's see how many fish we have over two feet. We can start with this:

def fishy_fish do  
  Repo.all(
    from fish in FishLanded,
    where: fish.length > 24
  )
end  

Running that splats out a bunch of records. Not very helpful. Let's count those fish:

def fishy_fish_count do  
  Repo.all(
    from fish in FishLanded,
    select: count(fish.id),
    where: fish.length > 24
  )
end  
06:31:43.126 [debug] SELECT count(f0."id") FROM "fish_landed" AS f0 WHERE (f0."length" > 24) [] OK query=6.1ms  
[211]

As you would assume, Ecto has support for all the basic aggregate functions, max, min, avg, sum, and count.

We should find out who caught these massive fish.

Repo.all(  
  from fish in FishLanded,
  join: fisherman in assoc(fish, :fisherman),
  group_by: fisherman.name,
  select: [max(fish.length), fisherman.name]
)
06:38:29.265 [debug] SELECT max(f0."length"), f1."name" FROM "fish_landed" AS f0 INNER JOIN "fishermen" AS f1 ON f1."id" = f0."fisherman_id" GROUP BY f1."name" [] OK query=14.3ms  
[[#Decimal<74>, "Mark"], [#Decimal<47>, "Kirk"], [#Decimal<30>, "Lew"],
 [#Decimal<30>, "Joe"]]

(Now that we are warmed up, I'm only going to list the query and the output.)

This is our first join to another model. As you can see we simply tell Ecto what association to follow and it sets up an inner join to that table in the database. We then use the name of the fisherman to group the record. The records don't look like they are in any kind of order. Let's fix that up now.

Repo.all(  
  from fish in FishLanded,
  join: fisherman in assoc(fish, :fisherman),
  group_by: fisherman.name,
  order_by: fisherman.name,
  select: [max(fish.length), fisherman.name]
)

06:45:36.320 [debug] SELECT max(f0."length"), f1."name" FROM "fish_landed" AS f0 INNER JOIN "fishermen" AS f1 ON f1."id" = f0."fisherman_id" GROUP BY f1."name" ORDER BY f1."name" [] OK query=2.4ms

[[#Decimal<30>, "Joe"], [#Decimal<47>, "Kirk"], [#Decimal<30>, "Lew"],
 [#Decimal<74>, "Mark"]]

Much better.

What if we want to look for only the fisherman who caught the biggest fish? (Pretend we can't just figure it out from the list above) We have a couple of options. Let's first try using the query we already have. Can we just put the result of the biggest_fish/0 function into a where clause?

Repo.all(  
  from fish in FishLanded,
  join: fisherman in assoc(fish, :fisherman),
  where: fish.length == biggest_fish, 
  select: [fish.length, fisherman.name]
)

BOOM!

== Compilation error on file lib/queries.exs ==
** (Ecto.Query.CompileError) variable `biggest_fish` is not a valid query expression. Variables need to be explicitly interpolated in queries with ^
    (ecto) expanding macro: Ecto.Query.where/3
    lib/queries.exs:50: FishingSpot.Queries.biggest_fish_per_fisherman_ordered_max/0
    (ecto) expanding macro: Ecto.Query.select/3
    lib/queries.exs:50: FishingSpot.Queries.biggest_fish_per_fisherman_ordered_max/0
    (ecto) expanding macro: Ecto.Query.from/2
    lib/queries.exs:50: FishingSpot.Queries.biggest_fish_per_fisherman_ordered_max/0

This doesn't compile but Elixir is all about helpful error messages. It tells us exactly what to do. We need to interpolate the result into the second query using the pin operator, ^. Update the where clause to:

where: fish.length == ^biggest_fish,  

Now it compiles and we can try again.

16:05:52.495 [debug] SELECT max(f0."length") FROM "fish_landed" AS f0 [] OK query=1.2ms

16:05:52.497 [debug] SELECT f0."length", f1."name" FROM "fish_landed" AS f0 INNER JOIN "fishermen" AS f1 ON f1."id" = f0."fisherman_id" WHERE (f0."length" = $1) [#Decimal<74>] OK query=1.4ms  
[[#Decimal<74>, "Mark"]]

As you can see, the result of the first query is used in the second.

But wait, shouldn't we be able to do this in a single query? We can but we need to use some more advanced techniques. Let's try a method that looks very elegant but is always a little hard for me to wrap my brain around.

Repo.all(  
  from fish in FishLanded,
  left_join: bigger_fish in FishLanded, on: fish.length < bigger_fish.length,
  join: fisherman in assoc(fish, :fisherman),
  where: is_nil(bigger_fish.id),
  select: [fish.length, fisherman.name]
)

18:29:37.636 [debug] SELECT f0."length", f2."name" FROM "fish_landed" AS f0 LEFT OUTER JOIN "fish_landed" AS f1 ON f0."length" < f1."length" INNER JOIN "fishermen" AS f2 ON f2."id" = f0."fisherman_id" WHERE (f1."id" IS NULL) [] OK query=163.8ms  
[[#Decimal<74>, "Mark"]]

It starts by joining FishLanded onto itself with a left join. The join should match the left side to the right if bigger_fish.length is greater than fish.length. Then we make sure that the left hand record is returned where there is no matching bigger_fish with where: is_nil(bigger_fish.id). The result is that only the row with the maximum length is returned because there are no other rows with greater lengths.

As is often the case, the solution's elegance comes with a price. Even with a small dataset this query can perform poorly. As you can see on my machine it took 163.8ms.

Here is yet another alternative.

Repo.all(  
  from fish in FishLanded,
  join: fisherman in assoc(fish, :fisherman),
  where: fragment(
    "? IN (SELECT MAX(biggest_fish.length) FROM fish_landed biggest_fish)", fish.length
  ),
  select: [fish.length, fisherman.name]
)

18:34:16.357 [debug] SELECT f0."length", f1."name" FROM "fish_landed" AS f0 INNER JOIN "fishermen" AS f1 ON f1."id" = f0."fisherman_id" WHERE (f0."length" IN (SELECT MAX(fish.length) FROM fish_landed fish)) [] OK query=1.8ms  
[[#Decimal<74>, "Mark"]]

This time we use the Ecto fragment/1 to put a subquery in our where clause. We grab the maximum length and then use that to return the entire record matching the maximum length using an IN clause. 1.8ms is quite a bit better performance. I think we should settle on this one.

Let's find out more about this monster fish Mark caught.

Repo.one(  
  from fish in FishLanded,
  join: fly_type in assoc(fish, :fly_type),
  join: fish_species in assoc(fish, :fish_species),
  join: fisherman in assoc(fish, :fisherman),
  join: trip in assoc(fisherman, :trips),
  join: locations in assoc(trip, :locations),
  join: location_types in assoc(locations, :location_type),
  where: fragment(
      "? IN (SELECT MAX(biggest_fish.length) FROM fish_landed biggest_fish)", fish.length
    ),
  where: fish.date_and_time >= trip.start_date,
  where: fish.date_and_time <= date_add(trip.end_date, 1, "day"),
  select: %{
    length: fish.length,
    date_caught: fish.date_and_time,
    fish_type: fish_species.name,
    fly: fly_type.name,
    fisherman: fisherman.name,
    trip_start: trip.start_date,
    trip_end: trip.end_date,
    location: locations.name,
    location_type: location_types.name
 })

18:33:48.875 [debug] SELECT f0."length", f0."date_and_time", f2."name", f1."name", f3."name", t4."start_date", t4."end_date", l5."name", l6."name" FROM "fish_landed" AS f0 INNER JOIN "fly_types" AS f1 ON f1."id" = f0."fly_type_id" INNER JOIN "fish_species" AS f2 ON f2."id" = f0."fish_species_id" INNER JOIN "fishermen" AS f3 ON f3."id" = f0."fisherman_id" INNER JOIN "fishermen_trips" AS f7 ON f7."fisherman_id" = f3."id" INNER JOIN "trips" AS t4 ON t4."id" = f7."trip_id" INNER JOIN "locations_trips" AS l8 ON l8."trip_id" = t4."id" INNER JOIN "locations" AS l5 ON l5."id" = l8."location_id" INNER JOIN "location_types" AS l6 ON l6."id" = l5."location_type_id" WHERE (f0."length" IN (SELECT MAX(biggest_fish.length) FROM fish_landed biggest_fish)) AND (f0."date_and_time" >= t4."start_date") AND (f0."date_and_time" <= (t4."end_date"::date + interval '1 day')::date) [] OK query=634.4ms

%{
  date_caught: #Ecto.DateTime<2014-08-15T06:36:29Z>, 
  fish_type: "Rainbow Trout",
  fisherman: "Mark", 
  fly: "Copper John", 
  length: #Decimal<74>,
  location: "Lake Fork", 
  location_type: "River",
  trip_end: #Ecto.Date<2014-08-18>, 
  trip_start: #Ecto.Date<2014-08-08>
}

Whoa! This is a monster query. We've now joined in all the models and have every possible piece of information about the big fish. There are a couple of points to make about this query. First, I have switched to the select to map syntax so that I can indicate what values return where. With so many return values using a list gets unwieldy. Additionally, there is no way to associate the landing of a fish with the trip the fisherman was on except by comparing dates. I'm looking for the trip where the date the fish was caught falls between the start and end of the trip. There is no between in Ecto so we resort to two where clauses (we could also use and/2 to put the clauses together). One issue arises from comparing the trip dates to the date_and_time field which is a datetime. If the fish was caught at some time on the last day of the trip then it will not be less than or equal to the trip end date. You can see for yourself if you fire up psql that a date type is cast to a timestamp with the time at 00:00:00.

select now() < now()::date;  
 ?column?
----------
 f
(1 row)
select (now()::date)::timestamp;  
         now
---------------------
 2015-09-28 00:00:00
(1 row) 

To solve this issue we use Ecto's date_add/3 function to add a day to the end of the trip so that it effectively falls on midnight of that day.

Oops: When I first wrote the sample data generator in Part 1 I neglected to ensure that the dates used for fish landed were calculated using one of the fisherman's trips. Also, I was missing aliases in some of the models. If you are using my old code you will want to pull the fixes.

I'm going to call it a day on this one. In Part 3 we will look at some other advanced querying techniques as well as Ecto's preloading that allows us to work with a record's associated data. Thanks for joining me on this adventure.

P.S. I have used the "keyword" style for all the Ecto queries in this post. Ecto has an alternative "expression" style. Drew Olson has a great post comparing and combining the two styles. Composable Queries with Ecto


Acknowledgements

A huge thanks to Kim Barnes (@Kimberlydbarnes) for reviewing and editing this post.

Comments

A Quick Dip into Ecto Types

A question came up on the #ecto Slack channel this weekend asking if Ecto supports atoms as a field type. I didn't know the answer but wondered if it could be done with a custom type.

The tl;dr is that it can be done as you can see here: https://github.com/parkerl/curiousecto/blob/customatomectotype/lib/atomtypeproblem.ex

Let's quickly walk through the code. The Ecto.Type documentation clearly outlines how to accomplish creating a custom type http://hexdocs.pm/ecto/Ecto.Type.html.

We must create a module and include the type behaviour with @behaviour Ecto.Type. Next we need to implement 4 functions, type/0, cast/1, load/1, and dump/1.

The type/0 function tells Ecto what "real" database type should back the custom type. The cast/1 function allows us to take values and turn them into something that can be stored in the database. The load/1 function tells Ecto what to do with values coming out of the database and dump/1 handles values going in. The last three expect must return a tuple in the format {:ok, value} or an :error atom.

Here is how to implement the atom field type backed by a string field in the database.

  defmodule AtomType do
    @behaviour Ecto.Type
    def type, do: :string

    def cast(value), do: {:ok, value}

    def load(value), do: {:ok, String.to_atom(value)}

    def dump(value) when is_atom(value), do: {:ok, Atom.to_string(value)}

    def dump(_), do: :error
  end

You can see the database type is set to :string. We don't need any fancy casting in this case so we just return the value inside an :ok tuple. The heavy lifting is all done by dump/1 and load/1. On the way into the database we convert an atom to a string. If an atom is not given we return :error. On the way out, we reverse the process in load/1.

Now we can employ our new AtomType in the model.

  schema "typed_table" do
    field :atom_type, Curious.TypedTable.AtomType
    timestamps
  end

We can insert a record with the atom_type field set to :monkey.

%Curious.TypedTable{atom_type: :monkey} |> Curious.Repo.insert!

And we can query data back out.

Curious.Repo.all(from t in Curious.TypedTable, where: (t.atom_type == ^:monkey))  

One gotcha, notice the use of the pin (^) operator above? We have to use this to interpolate the atom literal into the query. I was stuck for a bit because when you forget the pin the code fails to compile.

== Compilation error on file lib/atom_type_problem.ex ==
** (Ecto.Query.CompileError) `:monkey` is not a valid query expression
    expanding macro: Ecto.Query.where/3
    lib/atom_type_problem.ex:62: Curious.TypedTable.run_example/0
    expanding macro: Ecto.Query.from/2
    lib/atom_type_problem.ex:62: Curious.TypedTable.run_example/0

A huge thanks to José Valim for clearing this up for me...and for so many other things.

I'm not sure what the use case might be for using atoms over strings but it was a fun exercise figuring this out. If you think of a goos one please let me know.

Also a shout out to Karol Wojtaszek (@karol) on the Elixir Slack channel for prompting the idea behind this post.

Comments

Diving into Ecto: Part 1

First, a disclaimer. I have not attempted to blog in quite some time. I'm a bit rusty and I thank you in advance for your patience. Also, I'm an Elixir wannabe and by no means an expert. If I'm doing things wrong or showing bad examples please let me know.

I want to take a deeper look at Elixir's Ecto library. Going beyond the surface requires us to setup a data model with at least a little bit of complexity and generate an interesting dataset. That much should make for a decent sized post. I will defer querying until the next post. I'm attempting to give enough detail to explain what needs to be done without dumping in the entire source. You can find the code in all its hacky glory here:

https://github.com/parkerl/fishing_spot

I have chosen fly fishing as the muse for this series of posts. I know the domain well and enjoy day dreaming about big fish. I'm hoping I can convince my wife that I need to do some field research. Fly fishermen who have more time on their hands than I do often go on multi-day trips. On the trip they fish at multiple locations which might include rivers, streams or lakes. When a fish is landed the fisherman wants to record the location of the catch, the weight, length and species of the fish, and the type of lure or fly used. (The accuracy of this information can be somewhat suspect but that is a different topic altogether).

The schema drawn below attempts to capture in a very normalized way the data described above.

Let's get started!

Run mix new fishing_spot --sup && cd fishing_spot.

We are going to use postgres as our backend. I'm going to assume you already have it setup. A quick "google" should help if you don't.

Edit mix.ex and add the postgrex and ecto dependencies to the deps/1 function:

  defp deps do
    [
      {:postgrex, "~> 0.9.1"},
      {:ecto, "~> 1.0.0"}
    ]
  end

Add both to the applications list:

  def application do
    [applications: [:logger, :postgrex, :ecto],
     mod: {FishingSpot, []}]
  end

Run mix deps.get && mix deps.compile.

Next we need to create a Repo for the project. The Repo is basically the project's connection to the database.

Run mix ecto.gen.repo. This should create a file lib/fishing_spot/repo.ex with the following content:

defmodule FishingSpot.Repo do  
  use Ecto.Repo, otp_app: :fishing_spot
end  

When mix ecto.gen.repo is finished, it helpfully prints this message:

Don't forget to add your new repo to your supervision tree  
(typically in lib/fishing_spot.ex):

    worker(FishingSpot.Repo, [])

Open up lib/fishing_spot.ex and you should see these lines:

    children = [
      # Define workers and child supervisors to be supervised
      # worker(FishingSpot.Worker, [arg1, arg2, arg3])
    ]

We need to add our repo into that list:

children = [  
  worker(FishingSpot.Repo, [])
]

Edit the config/config.exs and we can see the command also added the following to our config file:

config :fishing_spot, FishingSpot.Repo,  
  adapter: Ecto.Adapters.Postgres,
  database: "fishing_spot_repo",
  username: "user",
  password: "pass",
  hostname: "localhost"

Let's change both username and password to fishing_spot in config/config.exs. We need to create that user in postgres. On my machine I do this by running psql -d template1. At the psql prompt our user can be created with:

CREATE USER fishing_spot WITH CREATEDB PASSWORD 'fishing_spot';  

The Schema

Migrations

Let's create migrations to setup our schema in the database. We could use a single migration file but I'm going to create a migration for each table just for fun. Create a file called migrations with the following content:

add_fly_types_table  
add_fish_species_table  
add_trips_table  
add_location_types_table  
add_locations_table  
add_fishermen_table  
add_fish_landed_table  
add_locations_trips_table  
add_fishermen_trips_table  

When you want to execute the same command many times with different inputs xargs is your best bet. By executing xargs -p -L 1 mix ecto.gen.migration < migrations we tell xargs to use each line of the file as the arguments for the mix command. Using the -p flag will tell xargs to ask before executing each command. This is useful here because the migrations are timestamped. Executing without pausing for confirmation can cause mix to create migrations with the same timestamp.

Migration files will be created in the priv/repo/migrations directory. Here are the files I have:

20150520032648_add_fly_types_table.exs  
20150520032650_add_fish_species_table.exs  
20150520032652_add_trips_table.exs  
20150520032656_add_location_types_table.exs  
20150520032657_add_locations_table.exs  
20150520032658_add_fishermen_table.exs  
20150520032700_add_fish_landed_table.exs  
20150520032703_add_locations_trips_table.exs  
20150520032705_add_fishermen_trips_table.exs  

Next I need to enter all the migration code into those files. Here is the change/0 function for the fish_landed migration which is probably the most complicated. Relationships to other tables are setup with references.

  def change do
    create table(:fish_landed) do
      add :date_and_time,    :datetime
      add :weight,           :decimal
      add :length,           :decimal

      add :fisherman_id, references(:fishermen)
      add :location_id, references(:locations)
      add :fly_type_id, references(:fly_types)
      add :fish_species_id, references(:fish_species)

      timestamps
    end
  end

I'm not going to list the code for the rest of the migrations. You can either copy them from the repo or take a stab at building them out yourself.

Run mix do ecto.create, ecto.migrate. This will create the database and run all the migrations. Let's take a look at the table that is created. Fire up psql with psql -d fishing_spot_repo and then enter \d fish_landed.

         Column      |            Type             |   
-----------------+-----------------------------+---------
 id              | integer                     | not null
 date_and_time   | timestamp without time zone |
 weight          | numeric                     |
 length          | numeric                     |
 fisherman_id    | integer                     |
 location_id     | integer                     |
 fly_type_id     | integer                     |
 fish_species_id | integer                     |
 inserted_at     | timestamp without time zone | not null
 updated_at      | timestamp without time zone | not null
Indexes:  
    "fish_landed_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:  
    "fish_landed_fish_species_id_fkey" FOREIGN KEY (fish_species_id) REFERENCES fish_species(id)
    "fish_landed_fisherman_id_fkey" FOREIGN KEY (fisherman_id) REFERENCES fishermen(id)
    "fish_landed_fly_type_id_fkey" FOREIGN KEY (fly_type_id) REFERENCES fly_types(id)
    "fish_landed_location_id_fkey" FOREIGN KEY (location_id) REFERENCES locations(id)

The table is created with all the correct field types for postgres and the foreign key constraints are added. However, indexes are not created for foreign keys by default. Let's add one to the fisherman_id column. We can add the following to the last line of the change/0 function above.

create index(:fish_landed, :fisherman_id)  

The easiest way to rerun the migrations is to completely drop and recreate the database. Run mix do ecto.drop, ecto.create, ecto.migrate.

Oh no! That didn't work. We get a common error no function clause matching. While this error message is not very helpful, it usually indicates that the pattern match of a function failed.

** (FunctionClauseError) no function clause matching in Ecto.Migration.index/3
    (ecto) lib/ecto/migration.ex:391: Ecto.Migration.index(:fish_landed, :fisherman_id, [])
    _build/dev/lib/fishing_spot/priv/repo/migrations/20150521042113_add_fish_landed_table.exs:18: FishingSpot.Repo.Migrations.AddFishLandedTable.change/0
    (stdlib) timer.erl:197: :timer.tc/3
    (ecto) lib/ecto/migration/runner.ex:22: Ecto.Migration.Runner.run/6
    (ecto) lib/ecto/migrator.ex:113: Ecto.Migrator.attempt/6
    (ecto) lib/ecto/migrator.ex:63: anonymous fn/4 in Ecto.Migrator.do_up/4
    (ecto) lib/ecto/pool.ex:292: Ecto.Pool.with_rollback/3
    (ecto) lib/ecto/adapters/sql.ex:566: Ecto.Adapters.SQL.transaction/8

Sure enough, the index/3 function takes a list as the second argument. Let's try again with

create index(:fish_landed, [:fisherman_id])  

Much better. Now we can see in psql that the index was created.

"fish_landed_fisherman_id_index" btree (fisherman_id)

Hmm...I wonder how far we can push Ecto's migrations? Does Ecto support composite primary keys? Let's see if we can find out. I see from the docs here that we can indicate which column is the primary key by passing primary_key: true. Maybe we should try:

    create table(:all_the_things) do
      add :some_string, :string, primary_key: true
      add :someother_string, :string, primary_key: true

      timestamps
    end

Uh oh! Things are not looking good. I get this error:

06:17:56.322 [info]  create table all_the_things  
** (Postgrex.Error) ERROR (invalid_table_definition): multiple primary keys for table "all_the_things" are not allowed
    (ecto) lib/ecto/adapters/sql/worker.ex:26: Ecto.Adapters.SQL.Worker.query!/4
...    

Apparently support has not been added for composite keys. It looks like it was discussed here but did not make it into ecto 1.0.

Let's see what else we can do. Can we use a string as a primary key?

create table(:all_the_things) do  
  add :some_string, :string, primary_key: true

  timestamps
end  

WTF?!@#$ When we migrate we get the same error.

** (Postgrex.Error) ERROR (invalid_table_definition): multiple primary keys for table "all_the_things" are not allowed

It turns out we have to turn off the creation of the default id column. When can pass primary_key: false to table/2 like so:

create table(:all_the_things, primary_key: false) do  
  add :some_string, :string, primary_key: true

  timestamps
end  

Creating Models

Next we need to flesh out the models.

Run:

mkdir lib/fishing_spot/models  
touch lib/fishing_spot/models/{fish_landed,fish_species,fisherman,fisherman_trip,fly_type,location,location_trip,location_type,trip}.ex  

Again I'm not going to show the code for all the models. Here is the FishLanded model which has several relationships to other models:

defmodule FishingSpot.FishLanded do  
  alias FishingSpot.Fisherman
  alias FishingSpot.FishLanded
  alias FishingSpot.FishSpecies
  alias FishingSpot.FlyType
  use Ecto.Model

  schema "fish_landed" do
    timestamps
    field :date_and_time,    Ecto.DateTime
    field :weight,           :decimal
    field :length,           :decimal

    belongs_to :fisherman,     Fisherman
    belongs_to :location,      Location
    belongs_to :fly_type,      FlyType
    belongs_to :fish_species,  FishSpecies
  end
end  

As you can see, we have to specify all the fields and relationships within the model.

Let's take a look at the fisherman.ex file as well.

defmodule FishingSpot.Fisherman do  
  alias FishingSpot.FishermanTrip

  use Ecto.Model

  schema "fishermen" do
    timestamps
    field :name
    field :date_of_birth, Ecto.Date

    has_many :fishermen_trips, FishermanTrip
    has_many :trips, through: [:fishermen_trips, :trip]
    has_many :fish_landed, FishLanded
  end
end  

In this model you can see a "has many through" relationship from fishermen to trips.

Finally we are ready to create records. Fire up iex -S mix and enter the following:

FishingSpot.Repo.insert %FishingSpot.Fisherman{name: 'Ahab'}  

Damn!

** (Ecto.ChangeError) value `'Ahab'` for `FishingSpot.Fisherman.name` in `insert` does not match type :string
      (ecto) lib/ecto/query/planner.ex:33: anonymous fn/6 in Ecto.Query.Planner.fields/4
    (stdlib) lists.erl:1262: :lists.foldl/3
      (ecto) lib/ecto/query/planner.ex:21: Ecto.Query.Planner.fields/4
      (ecto) lib/ecto/repo/model.ex:67: anonymous fn/10 in Ecto.Repo.Model.insert/4

Pattern matching strikes again. This is yet another common gotcha in Elixir. In languages like Ruby and Javascript single and double quotes can be used mostly interchangeably. In Elixir single quotes denote a Char list. We have to use double quotes for strings.

FishingSpot.Repo.insert %FishingSpot.Fisherman{name: "Ahab"}  

BOOM!

** (ArgumentError) repo FishingSpot.Repo is not started, please ensure it is part of your supervision tree
    (ecto) lib/ecto/adapters/sql.ex:548: Ecto.Adapters.SQL.transaction/3

If you were paying attention above you will not get this error. However, I did not heed my own advice and forgot to add my Repo to the supervision tree in lib/fishing_spot.ex.

Once we fix that up we can finally see the record inserted. Ecto helpfully logs the exact SQL that is sent to postgres:

16:12:23.749 [debug] BEGIN [] OK query=297.5ms queue=15.0ms

16:12:23.845 [debug] INSERT INTO "fishermen" ("date_of_birth", "inserted_at", "name", "updated_at") VALUES ($1, $2, $3, $4) RETURNING "id" [nil, {{2015, 9, 6}, {22, 12, 23, 0}}, "Ahab", {{2015, 9, 6}, {22, 12, 23, 0}}] OK query=3.8ms

16:12:23.855 [debug] COMMIT [] OK query=3.0ms  

Sample Data

Now that we have the Ecto models fleshed out and everything working, it's time to generate some data to play with. Let's create a file priv/repo/seeds.exs with a module called FishingSpot.Data to do the work via the generate/0 function.

We will want to run this function over and over as we build up our implementation. Let's add this line outside the FishingSpots.Data module at the very end of the file:

FishingSpot.Data.generate  

This will cause the data generation function to be run every time we load this file.

We can run it with either mix run priv/repo/seeds.exs or by firing up IEx with iex -S mix. Within the IEx session run c("priv/repo/seeds.exs") to compile and run the file.

The first thing we will want is to ensure the database is in a pristine state before generating data. The easiest way to do that is to drop and recreate the database. We can simply lean on Ecto's built in mix tasks.

The generate/0 function should look like:

  def generate do
    Mix.Task.run "ecto.drop",     ["FishingSpot.Repo"]
    Mix.Task.run "ecto.create",   ["FishingSpot.Repo"]
    Mix.Task.run "ecto.migrate",  ["FishingSpot.Repo"]
    _generate
  end

(Note that I have aliased all the necessary FishingSpot modules so I don't have to prefix them.)

The actual data generation will happen in the private function _generate/0. Let's flesh that out now. It would be nice to start with some fish species. In Ecto, records are created by making a struct that has the same name as the model, and keys with the names of the underlying table columns like %ModelName{attribute: "some value"}. This struct is then passed to the Repo which handles persistence. We can insert a record for Rainbow Trout and see what ID is assigned by the database.

rainbow_trout = Repo.insert %FishSpecies{ name: "Rainbow Trout" }  
IO.puts rainbow_trout.id  

However, when we do this and try to run it we get the following error:

== Compilation error on file priv/repo/seeds.exs ==
** (UndefinedFunctionError) undefined function: :ok.id/1 (module :ok is not available)
    :ok.id({:ok, %FishingSpot.FishSpecies{__meta__: #Ecto.Schema.Metadata<:loaded>, id: 11, inserted_at: #Ecto.DateTime<2015-08-25T11:57:20Z>, name: "Rainbow Trout", updated_at: #Ecto.DateTime<2015-08-25T11:57:20Z>}})
    priv/repo/seeds.exs:24: FishingSpot.Data._generate/0
    (elixir) lib/kernel/parallel_compiler.ex:95: anonymous fn/4 in Kernel.ParallelCompiler.spawn_compilers/8

What in the world is going on here? The problem is that Ecto's insert/2 function follows the Erlang idiom of returning a tuple where the first element is an atom indicating the success or failure of the operation. The second element will be the newly inserted record. So in the code above the rainbow_trout variable would end up set to the atom :ok which is the idiomatic indicator for a successful operation. Of course the atom :ok doesn't have an id function, which causes the error. Looking more closely at the error message, we can see something interesting. The part (module :ok is not available) shows that Elixir is actually looking for a module called ok. This is because Elixir uses atoms for module names under the hood. This can lead to confusion if you have a bug like I did above where you accidentally call a function on an atom.

We have two options to fix our issue. We can pattern match the tuple to put the new record into the variable like so:

{:ok, rainbow_trout } = Repo.insert %FishSpecies{ name: "Rainbow Trout" }

or we can use Ecto's insert!/2 function.

rainbow_trout = Repo.insert! %FishSpecies{ name: "Rainbow Trout" }  

Let's use the latter for simplicity. We can write similar insert!/2 calls for the models FlyType, and LocationType.

Next we need to insert our fishermen using the Fisherman model. The Fisherman model has a date field called date_of_birth. This is the first time we've encountered one of Ecto's special datatypes, Ecto.Date. In order to create an Ecto.Date we again build a struct with year, month, and day keys. Inserting a fisherman then looks like:

mark = Repo.insert! %Fisherman{  
  name: "Mark",  
  date_of_birth: %Ecto.Date{ 
    year: 1970, 
    month: 1, 
    day: 2 
  }
}

The next piece of the puzzle is to connect records via foreign keys. For instance, Locations need a LocationType. We can simply put the id of the referenced record into the foreign key column of the other.

river = Repo.insert! %LocationType{ name: "River" }

white_river = Repo.insert! %Location{  
  name: "North Fork White River", 
  altitude: 5000, 
  lat: Decimal.new(40.051879), 
  long: Decimal.new(-107.458016), 
  location_type_id: river.id 
}

Now with that out of the way it's time to catch some fish! Or at least pretend like we did in the age old fishing tradition.

We will want to randomly assign all the attributes and associations of a bunch of FishLanded records to make things interesting. Let's see how we can accomplish that.

First we need some randomness and this is how we get it:

:random.seed(:erlang.now)

Next we need a big loop. Enum.each(1..1000, fn(_) -> should do the trick. Inside, we first need to randomly grab the records that should be associated with each FishLanded. I've stored all the records created up to this point in lists for this purpose. To get a random record from each let's try:

fisherman = Enum.shuffle(fishermen) |> List.first  
location = Enum.shuffle(locations) |> List.first  
fly = Enum.shuffle(flies) |> List.first  
trip = Enum.shuffle(trips) |> List.first  
fish = Enum.shuffle(fish_types) |> List.first  

Now we need to come up with a reasonable size for the fish.

length = Decimal.new(:random.uniform(30))  
weight = Decimal.new(:random.uniform(5))  

Finally we need to determine the date and time the fish was caught. We should make sure it happened between the start and end of the trip that was selected above. Conveniently in my made up data no trip spans multiple months. We can simply grab the starting day and ending day and choose a random day in between. Let's also just use the current local time. We can then use Ecto.DateTime.from_date_and_time to build the value we need to insert the records into the database.

%Ecto.Date{year: year, month: month, day: trip_start} = trip.start_date
%Ecto.Date{day: trip_end} = trip.end_date

trip_length = trip_end - trip_start  
day_caught = :random.uniform(trip_length)  
date_caught = %Ecto.Date{year: year, month: month, day: trip_start + day_caught}

date_and_time_caught = Ecto.DateTime.from_date_and_time(  
date_caught,  
Ecto.Time.local())  

Putting it all together we now can insert our randomized records:

Repo.insert %FishLanded{  
  date_and_time: date_and_time_caught, 
  weight: weight, 
  length: length, 
  fisherman_id: fisherman.id, 
  location_id: location.id, 
  fly_type_id: fly.id, 
  fish_species_id: fish.id
}

Here is my final version of seeds.exs.

We have 1000 lies...I mean records...in the database and we are ready to go fishing for some data. Stay tuned for my next post on Ecto's query interface.


Thanks!

I owe a huge thanks to my team at ATHN, Kim Barnes, David Madouros, Andrew Courter and Joe Santini. They reviewed the multiple iterations of this post and without their help it would be even more of a train wreck.

Also props to the entire Elixir community for being so welcoming and fun!

Comments