Vlad's blog

In programming veritas

Task Manager: data access layer

leave a comment »

This is the third post of a series of posts on developing Web applications using ASP.Net MVC. The previous two posts are available here:

The source code is available on CodePlex.

The data access layer allows application components to abstract from implementation details of a specific database access technology. Since we use ORM, we already have a substantial level of abstraction that is implemented in the ORM, in our case, Entity Framework. In the case of a simple application like Task Manager, perhaps, there is no need for a separate level of abstraction to isolate parts of the EF from other system components. The only advantage here is that we can easily migrate to another ORM, such as NHibernate. On the other hand, it does not require much effort and demonstrates the good approach in terms of application architecture.
In this case, the standard solution is to use a pattern Repository in conjunction with Unit of work .
From a design standpoint it is important to decide which entity requires a separate repository, and what is not. Eric Evans in his book about DDD recommends creating a repository for each aggregate root. In our case we have three aggregate roots: Project, Task, User, respectively, we have three repositories:

  • IProjectRepository
  • ITaskRepository
  • IUserRepository

Since all the repositories have a standard set of CRUD operations, plus methods for searching and filtering, we can define the common interface they must implement.

public interface IRepository<TEntity, TKey>
    where TEntity: IEntity<TKey>
{
    IQueryable<TEntity> All { get; }
    IQueryable<TEntity> AllIncluding(params Expression<Func<TEntity, object>>[] includeProperties);
    TEntity Find(TKey id);
    void Insert(TEntity entity);
    void Update(TEntity entity);
    void Delete(TKey id);
}

All methods should be obvious, except, perhaps, AllIncluding, but I will tell about it later.
Generic interface IRepository has two parameters: TEntity – that defines the serialized class and TKey – type of the primary key (usually int). TEntity must implement the interface IEntity, which defines the Id property to store the primary key values.

public interface IEntity<T>
{
    T Id { get; set; }
}

Pattern Unit of work is implemented using the interface IUnitOfWork, which is available to clients along with a specific repository.

public interface IUnitOfWork
{
    void Save();
}

Entity Framework Code First

Unlike previous versions of EF Code First there is no need to explicitly specify the database schema and objects mapping. EF uses the domain model classes to create the database model. The starting point is a class that derives from DbContext, which defines the entities that will be serialized in the database.

public class TaskManagerDbContext : DbContext, IUnitOfWork
{
    public TaskManagerDbContext()
        : base("TaskManager")
    {
    }

    public DbSet<User> Users { get; set; }
    public DbSet<Task> Tasks { get; set; }
    public DbSet<Project> Projects { get; set; }
    public DbSet<Comment> Comments { get; set; }
    public DbSet<ProjectLogEntry> ProjectLogs { get; set; }
    public DbSet<TaskLogEntry> TaskLogs { get; set; }

    public void Save()
    {
        base.SaveChanges();
    }
}

TaskManagerDbContext EF knows which classes will be serialized to the database. (Note: the database connection string is taken from the name that is passed to the base class constructor. The name must be present in the configuration file, otherwise it will attempt to connect to the default database). EF analyzes the metadata to create the table structure and establish associations between them. It uses some conventions. For example, a field that specifies the primary key must be named classnameId or Id. For this purpose, you can also use Data Annotation. You can specify the primary key explicitly by using the attribute [Key]. Here are three possible declarations.

public class Project: IEntity<int>
{
    [Key]
    public int Id { get; set; }

    // public int Id { get; set; }

    // public int ProjectId { get; set; }
}

Table dbo.Projects with primary key named Id will be created to represent Project class. All public properties will be available as columns in dbo.Projects. In addition, there is an opportunity to implement their own rules by which EF will build a model of the database. To do this use Pluggable Conventions.
Conventions are used to establish associations between tables. If the EF discovers the property declared as virtual ICollection, it assumes that one-to-many association exists between Project and Task.

public class Project: IEntity<int>
{
    public virtual ICollection<Task> Tasks { get; private set; }
}

Every .Net collection that supports Add/Delete operation can be used. In addition Task must have a public property of Project type.

public class Task : IEntity<int>
{
    [Key]
    public int Id { get; set; }
    public virtual Project Project { get; private set; }

There are enough attributes , which can be used for annotation of properties. If for some reason you are not satisfied with Code First’s conventional behavior, you can use Code First Fluent API , which allow you to explicitly specify the database model.
DbSet class provides all functionality needed for implementation of Generic Repository pattern.

public class Repository<TEntity, TKey> : IRepository<TEntity, TKey>
    where TEntity: class, IEntity<TKey>
{
    private readonly DbContext _context;
    private readonly DbSet<TEntity> _dbSet;

    public Repository(DbContext context, DbSet<TEntity> dbSet)
    {
        _context = context;
        _dbSet = dbSet;
    }

    public IQueryable<TEntity> All
    {
        get { return _dbSet; }
    }

    public IQueryable<TEntity> AllIncluding(params Expression<Func<TEntity, object>>[] includeProperties)
    {
        IQueryable<TEntity> query = _dbSet;
        foreach (var includeProperty in includeProperties)
        {
            query = query.Include(includeProperty);
        }
        return query;
    }

    public TEntity Find(TKey id)
    {
        return _dbSet.Find(id);
    }

    public void Insert(TEntity entity)
    {
        _dbSet.Add(entity);
    }

    public void Update(TEntity entity)
    {
        _context.Entry(entity).State = EntityState.Modified;
    }

    public void Delete(TKey id)
    {
        var entity = _dbSet.Find(id);
        _dbSet.Remove(entity);
    }
}

The concrete repository class is usually inherited from the Repository to implement the desired functionality.

internal class TaskRepository : Repository<Task,int>, ITaskRepository
{
    public TaskRepository(TaskManagerDbContext context)
        : base(context, context.Tasks)
    {
    }
}

Performance issues

ORM makes life easier for the programmer, allowing you to not think about how to make a SQL query and map relational model into objects graph. At the same time work with ORM has a number of nuances, a proper understanding and use of which affects the performance of application. One of them is reading related data.
Lazy load
When the entity is first read, related data isn’t retrieved. For example, if we loaded an instance of Project, then the related tasks will not be loaded. However, the first time you attempt to access the Tasks property, the tasks are automatically retrieved. In the following example, we load a list of all projects. Then, for each project, get tasks, which are stored in the taskList.

var taskList = new List<string>();

var projectList = projectRepository.All.ToList();

foreach (var project in projectList)
{
    foreach (Task task in project.Tasks)
    {
        taskList.Add(task.Summary);
    }   
}

The following SQL command is generated in order to select list of projects.

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
FROM [dbo].[Projects] AS [Extent1]

As we go through the loop on all the projects the following SQL command is executed for each project.

exec sp_executesql N'SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[ProjectId] AS [ProjectId], 
[Extent1].[Summary] AS [Summary], 
FROM [dbo].[Tasks] AS [Extent1]
WHERE [Extent1].[ProjectId] = @EntityKeyValue1',N'@EntityKeyValue1 int',@EntityKeyValue1=1

In this case EF brings the results back one row at a time. This is incredibly inefficient. This is anti-pattern known as SELECT N+1. But this does not mean that lazy load is always bad. When you need to access an entity’s navigation properties only infrequently or only for a small portion of a set of entities you’re processing, lazy loading is convenient. In order to turn off lazy loading for property you should omit the virtual keyword when you declare the property. If you want to disable lazy loading set LazyLoadingEnabled = false.

public TaskManagerDbContext()
    : base("TaskManager")
{
    Configuration.LazyLoadingEnabled = false;
}

Eager load
In practice it is often more efficient to load all the required data in a single query than in a series of SQL commands. Eager loading allows retrieving related data along with entity. This feature is available as Include extension method. In our case we use method IProjectRepository.AllIncluding.

var taskList = new List<string>();

var projectList = projectRepository.AllIncluding(p => p.Tasks).ToList();

foreach (var project in projectList)
{
    foreach (Task task in project.Tasks)
    {
        taskList.Add(task.Summary);
    }
}

This results in a single join query that retrieves all of the data that’s needed.

SELECT 
[Project1].[Id] AS [Id], 
[Project1].[Name] AS [Name], 
[Project1].[Id1] AS [Id1], 
[Project1].[ProjectId] AS [ProjectId], 
[Project1].[Summary] AS [Summary] 
FROM ( SELECT 
	[Extent1].[Id] AS [Id], 
	[Extent1].[Name] AS [Name], 
	[Extent2].[Id] AS [Id1], 
	[Extent2].[ProjectId] AS [ProjectId], 
	[Extent2].[Summary] AS [Summary], 
	CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
	FROM  [dbo].[Projects] AS [Extent1]
	LEFT OUTER JOIN [dbo].[Tasks] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProjectId]
)  AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC

Entity Framework Profiler is usefull tool for tuning the performance of Entity Framework. They also have a good description of common problems related to ORM.

How to choose ORM

When you design data access layer you should think what particular ORM to choose. Of course, Entity Framework or NHibernate provide many features, but do not forget about their complexity. If you need to only read data from the database, you should consider alternative ORM.

Advertisements

Written by vsukhachev

January 3, 2012 at 7:01 pm

Posted in Development

Tagged with ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: