Avoid using Doctrine’s Collection::matching method

If you are familiar with Doctrine\Common\Collections\Selectable collections in Doctrine (see docs), you probably know the power of matching method provided by this interface.

Basically, it allows you to filter a portion of entities directly from your entity:

use Doctrine\Common\Collections\ArrayCollection;
use Doctrine\Common\Collections\Criteria;
use Doctrine\Common\Collections\Collection;

class Company {

    /**
     * @var Collection<int,User>|User[]
     *
     * @ORM\OneToMany(targetEntity=User::class)
     */
    private Collection $users;

    public function __construct() {
        $this->users = new ArrayCollection();
    }

    /**
     * @return User[]
     */
    public function getActiveUsers(): array
    {
        $criteria = Criteria::create()->andWhere(
            Criteria::expr()->eq('active', true)
        );
        return $this->users->matching($criteria)->toArray();
    }
}

Great, now you can easily get only active users of the company without any need of repository or building a query with Doctrine\ORM\QueryBuilder. Selectable interface is implemented by:

  • Doctrine\Common\Collections\ArrayCollection
  • Doctrine\ORM\PersistentCollection
    • Used when entity is initialized by Doctrine (entity is loaded from database).

Let’s dig deeper into the case where you fetched the entity from database and the collection is not yet initialized. Uninitialized collection means we did not call any method on it that causes full result load. Like count, toArray, isEmpty, etc. The list of methods differs with LAZY relations (default) and EXTRA_LAZY. When collection becomes initialized, it means we have all entities locally inside the collection and all operations can be performed without touching database. But calling matching method does not cause collection initialization. Thanks to that it is very performant since it loads only needed data from database. No matter that this company has thousands of users, it hydrates only the active ones:

$company = $entityManager->find(Company::class, 1);
$users = $company->getActiveUsers();

So, in this example, if the company was loaded for the first time, the collection remains uninitialized and will repeat db query for every call of getActiveUsers until it gets initialized.

Now, here comes the important part. Due to the nature of identity map, this means that if you just deactivated a user of your company, method getActiveUsers may return the user you’ve just deactivated. Why?

It’s because when you fetch an entity from database, Doctrine checks if it already exists in the identity map and in that case returns the existing identity. That is performed without refreshing the data of the entity! So, following code may throw an exception (if the user #1 was active user of company #1):

$user = $entityManager->find(User::class, 1);
$user->deactivate();

$company = $entityManager->find(Company::class, 1);

foreach ($company->getActiveUsers() as $user) {
    if ($user->isInactive()) {
        throw new LogicException('Can never happen, I fetched only active!'); // sadly, it can!
    }
}

This can lead to bugs that are very hard to find. There are few ways to avoid this problem:

  • The safe one: flush after deactivating the user.
    • That way, the db and in-memory data will be in sync and calling getActiveUsers would reflect the deactivation you just did.
    • The problem is that you need to know somebody will call the dangerous method. And you definitely don’t want to flush after every entity change.
  • The ineffective one: filter out inactive users in PHP instead of using matching.
    • That way, because the full result of the collection would be fetched, and the user from identity map would be inside, it would be filtered out by your code.
    • The problem is that if the full collection is huge, you can meet performance issues (time and memory) when thousands of entities gets hydrated.
  • The tricky one: call matching twice
    • That way, you filter out inactive users in database AND in memory. So it does not matter it is currently out of sync. PersistentCollection::matching, returns ArrayCollection with the partial result of users fetched from database and allows you to use matching again. But this time, the filtering happens in memory. So the second call “fixes” the problem that the deactivated user was reused from identity map with in-memory data instead of being hydrated from database like the rest of active users.

Knowing that, you probably wonder about the inversed case. What if I activated some user recently?

$reactivatedUser = $entityManager->find(User::class, 1);
$reactivatedUser->activate();

$company = $entityManager->find(Company::class, 1);

foreach ($company->getActiveUsers() as $user) {
    if ($user === $reactivatedUser) {
        // may not be true even when reactivated user belongs under our company
    }
}

In this case, the problem is much bigger since the first filtering is performed in database and that will never reflect the in-memory changes you just did. So the double-call does not help here and there is no simple solution to this case. You can pick from the other solutions mentioned above or manually notify the company entity if the user gets activated, which is a bit complicated solution.

So, the easiest is to avoid using matching method unless you really need it for performance reasons. In that case, be aware of the problems mentioned in this article.

About author:

Jan Nedbal

Architect & developer at ShipMonk



Leave a Reply