php[architect] logo

Want to check out an issue? Sign up to receive a special offer.

Zend Db 2.0 Kicks Off

Posted by on June 25, 2010

Red Onions by Darwin Bell (Creative Commons Attribution 2.0 Generic)
Ralph Schindler has started the ball rolling on requirements for Zend Db for Zend Framework 2.0.  He announced on the ZF Contributors mailing list:

 

Requirements have been solicited from both community members in various conversations, as well as looking through the issue tracker for feature requests that have been on the backlog due to potential BC breakage. This document reflects those ideas, and it’s now in a position where we’d like to start a discussion on the direction outlined inside it.

I worked on Zend_Db during the push to Zend Framework 1.0, so I was interested to see what the goals are for the next major revision.

Create Zend\DB\Registry

This is promising.  Lots of apps need to use multiple database servers transparently.  Zend Db should support a Composite that maps queries to multiple Db objects, delegating using policies defined by the developer.

Pluggable architecture

This is tricky.  The risk is the one will run afoul of the YAGNI principle, creating plugin hooks that virtually no one needs.  It’s a good time for the developer community to step up and document compelling use cases.  Everyone should instantly recognize how important and useful it would be to have a plugin hook RIGHT THERE.

Distinct abstractions

I.e., decouple the connection adapter from query abstraction features.  Decoupling is often good, but too much of a good thing can make a system harder to use.  Why exactly would one need to do this?  Do we need to mix and match an Oracle connection with a MySQL query abstraction class?

This gets to a more general guideline about writing software requirements documents:  the justification for a feature is not the absence of that feature. The requirements need to be more clear about the problem to be solved, instead of any particular solution.

Addition of DDL query abstraction

This is very hard.  First, DDL varies between vendors even more than DML varies. I put a lot of work into supporting pseudokeys in a vendor-neutral way.  Don’t get me started on data types; there’s almost no single data type that’s supported by every RDBMS vendor, not even INTEGER.

Second, vendor-proprietary DDL features are important.  Think of special index types, table partitioning, storage engine options, or fulltext search functions.  There is very little commonality between vendors in these features,  but when you need them, you really need them.

Third, do DDL operations really need to be run from PHP applications (besides phpMyAdmin)?

Addition of a Metadata sub-component

This is a good idea.  Not that it’s 2010, we should be able to rely more on standard system tables to tell us more about the schema.  Also you shouldn’t have to wire related objects together by assigning their pseudokeys yourself, you should just add one object as a dependent of another, and the objects would consult their metadata to know how they’re related.

However, I recommend strongly:  eliminate cascading update/delete in PHP code.  I regret implementing this.  The RDBMS engine is the only place cascading logic can be implemented while preserving data integrity.

Separate Table & Row Gateway into sub-components of their own

This is another solution in search of a justification.  Anyone who uses an ORM for long realizes that it’s a leaky abstraction; you need to write hand-crafted SQL more often than advocates of ORM frameworks will admit.  The Table gateway is for operations on one table.  Many ordinary queries operate on multiple tables.  Making the Table class be the sole provider of insert(), update(), and delete() operations creates an awkward coupling between those operations and a single table.

Better testability in the Unit Tests

Yes indeed.  Testing has come a long way in the past three-plus years.  It’s true that much of the unit testing of Zend Db classes should be done in isolation (as unit testing should be), using test doubles such as mocks and stubs.

Test setup is another justification for DDL abstraction in Zend Db, but this would be a lot of work for very little gain.  It would be far easier to load vendor-specific SQL scripts through the vendor’s client.

I wrote the infamous Zend Db unit testing framework out of desperation to have some way to run the same set of  tests against all supported vendors of RDBMS.  You need to run tests against live databases. I can’t emphasize this enough.  I found and fixed dozens of bugs during development of Zend Db, that were bugs only for one database vendor.

Base Plugins / Type Converter

The problem with type converters that map SQL data types to native PHP types is that some types supported by SQL that can’t map to native PHP types.  The best examples are 64-bit integers or unsigned integers.  Some SQL data types can only map to PHP strings.

What’s missing?

What about some of the long-running feature requests for Zend Db, like stored procedures, character sets, and support for queries that can’t run as prepared statements?  What about clarifying the difference between a Model and a data layer?

I saw in this first pass at Zend Db 2.0 requirements a lot of implementation plans but not so much requirements as I would define them.  I blogged about this when Matthew Weier O’Phinney wrote a higher-level set of goals for Zend Framework 2.0 some months ago.

Basically, the architecture changes are interesting only to a rarefied audience.  But you can’t pitch a product based on how it’s architected or implemented. You pitch a product based on what users can do with it, and what problems it solves.  What problems do users of Zend Db have, and how will they use Zend Db 2.0 to solve those problems?


Tags: ,
 

Responses and Pingbacks

Elizabeth M Smith on
June 25th, 2010 at 11:58 am

I can think of a VERY good example of decoupling the connection adapter from query abstraction features – mysqli vs. mysqli vs. pdo_mysql – three different ways to talk to the database, all need the same db abstraction

pdo_dblib, pdo_mssql, mssql, sqlserver and the sql server PDO version
All the same backend query abstraction needed – all those different ways of talking to the db….

Thanks Elizabeth, you’re right, that’s a legitimate use case. So it should be in the requirements document. That’s the point I’m making. The requirements doc is where one spells out the use cases and the problems that need to be solved. But software architects naturally want to jump straight to talking only about the solutions.

The description of a proposed implementation belongs in a separate document. We can think of this as another kind of decoupling. If you determine that a given solution isn’t right, you can put that document aside and write a new one, but you shouldn’t touch the original document that records the requirements.

If the “requirements” document mixes requirements and solutions, then updating the document when you change your mind about a solution is harder, like working on spaghetti code. Perhaps that’s why software developers typically don’t like to write docs.

This is exactly the type of discussions I wanted to see kicked off! Thanks for the feedback, Bill. While that document is in its infancy (like I mentioned, a springboard for this kind of discussion), I will definitely take everything you wrote above into consideration to expound it.

Elizabeth, that is one of the primary use cases that this abstraction is attempting to solve. In layman’s terms, I was promoting that there should be a separate abstraction between “how” you speak to the database, and “what” you speak to the database.

I will ping you when there are updates to this document. Again, thanks for the write-up Bill!

maybe a too lofty goal for just ZF but there seems to be a real need to define an interface as a common base line for all of PHP. well PDO should have been that but it isnt for various reasons.

an approach might be to first sit down and define interfaces for the immediate layer that is needed for low level db interaction. and then for higher level stuff like DML and DDL.

later on we cab see what needs to be ported to C for optimal performance and this could even be left to the vendors who want to gain a competitve advantage. the model of a common core seemed to have hindered cooperation and as a result left PDO lingering next to the old extensions

Hi Lukas. I think it’s a bit outside of the Zend Framework mission as I understand it. I thought ZF was intended to be 100% PHP code, and depend only on standard PHP extensions, not custom extensions. The ideas have merit, however IMHO they belong in a different project.

Well what I am suggesting is pure PHP in step one. The defined interface would be implemented on top of the existing extensions.

Then in a next step vendors could decide to give things a speed boost, but implementing the interface in C.

I think what Lukas was getting at is that if a common layer can be pounded out then it could be the basis of a new data layer for PHP one that individual organizations (such as Oracle and Microsoft) could make contributions to. The discussion is purely outside of ZF but to boil it down once a decent api fleshes itself out there is definitely a need for it at the common extension level.

Thanks Shawn, I do understand. The subject is off-topic for this blog.

What about a clearer connection / interaction or community with ORM (Doctrine as favorit)?

In my oppinion this will be used in more and more cases in the future, but of course you also need the general and good Access which Zend_DB provides.

It would be great, if you could use both in combination, for example by providing an adapter to get the Zend_DB connection out of Doctrine or give it doctrine to work with without the need of extra configuration.

What do you think about it?

You wrote that you strongly recommend to “eliminate cascading update/delete in PHP code”.
I concur, as I had to use this feature in Doctrine in some cases. Its obviously super slow compared to in DB cascades, but the DB just can’t check all the conditions that the code does.
If you don’t allow for cascading deletes, then the validation code will have to select all related records, run something like canDelete() and delete based on the outcome. You’d rather do a cascaded delete in code in such a case. It should at least be an option, but obviously per object/table.

Well DB level cascade’s are nice, unless you want to execute some logic. Going stored procedure is often not the best route then because of lack of experience, lack of express-ability in limited stored procedure syntax, resulting duplication of code, lack of good debugging tools etc.

So ORM level cascade does make sense. Of course the user must be aware that it could set of a gigantic number of queries.

The first problem is that a PHP script that attempts to implement cascading changes to a database cannot make those changes atomically.

Example: Suppose you define foreign key constraints in the DB but you try to implement cascading updates in client code. Do you change the parent’s primary key column first, or the child’s foreign key column? Either way violates the constraint. The only solution is to drop, disable, or defer the constraint while you make the changes. This spoils concurrency.

The second problem is that a client script can leave changes incomplete, resulting in breaks in referential integrity.

Example: Suppose you don’t define FK constraints, and you go to change a PK and its child FK. You change the PK, but then your script has a fatal error and exits. Now your FK is orphaned.

You can try to use transactions to mitigate this, but this introduces the possibility of deadlock where none existed before.

You can make arguments about lack of developer experience or poor tools or you don’t like stored procedure syntax or whatever, but these don’t change the fact that it simply DOESN’T WORK to implement cascading operations in client code.

ORM level cascade does NOT make sense. This has nothing to do with the number of queries, it has to do with ensuring consistency. Ensuring consistency must be the responsibility of the RDBMS.

@Julian: I’m not sure I understand the use cases you’re suggesting. In my mind, using Doctrine would be mutually exclusive with using Zend_Db. Doctrine is a perfectly nice library, but if you’re using it, why would you use Zend_Db in the same app?

I think that one of the things that Zend_Db is really missing is support for clusters of servers. We have a number of apps where there are a number of slaves to every master. What we’ve had to do was implement a wrapper around Zend_Db that kept track of the types of queries being performed, and whenever a write came up, we had to make sure to open a connection to the master. In the case of most reads, we made sure to open a connection to a slave.

Is there any chance of seeing similar functionality in Zend_Db?

@Chris, I agree with the requirement for managing connections to multiple database transparently. I think that’s the idea behind the proposed Zend\Db\Registry that’s mentioned. I.e. they’re already thinking of this as a desirable feature in ZF 2.0.

I’d like to see it implemented in a way that allows an app developer to define their own code to plug in and control how queries are balanced across servers. You give an example of writes going to one server and reads going to another server, but what if you balance reads across three slaves, and you know one of them is running on older server hardware, so you want only 20% of the read queries to go to that server instead of 33%? It’d be nice to have a hook the Composite class so you could write your own custom logic for this.

Elizabeth, that is one of the primary use cases that this abstraction is attempting to solve. In layman’s terms, I was promoting that there should be a separate abstraction between “how” you speak to the database, and “what” you speak to the database.

Leave a comment

Use the form below to leave a comment: