Why do we need a DB abstraction for MidCOM
Up until and including the MidCOM 2.4 strain, all components accessed the MidCOM database directily, without an interface within MidCOM. Finally, when starting to integrate MgdSchema into what is eventually going to be MidCOM 2.6 I decided it was time to change that. Why and What I want to change is outlined in this article.
So what annoys me at what current MidCOM does? To put it simply, I,
as a framework developer, can't hook in any checks into the database
queries without implementing it in all components. This is mainly of
interest when it comes down to a) access control and b) more advanced
forms of cache invalidation.
But this is not all.
Right now there are several ways within the MidCOM source tree where
component authors (including myself) encaspulate various Midgard
objects in their own component-level classes. None of them are
standardized in that they provide framework-driven features.
This gets especially important with the transition to MgdSchema,
where right now both Rambo and I have started to write our own wrapper
classes around the yet-incomplete MgdSchema objects out of sheer
neccessity. While Rambo went for a more localized approach, I am
currently trying to get a general solution up and running for MidCOM.
My aims are to hide all these nifty little details from the
component author, providing an extended interface over what MgdSchema
offers at this time on a PHP level. In the future, part of these
features can then be superseeded by a core implementation (thus the
full encaspulation) for performance reasons. Also, I want to have an
easy way of adding new MidCOM features to all components at once by
adding a single patch of code to the core classes, instead of changing
all classes out there in the wildlife.
As usual nowadays, one major consideration will be performance, I
have neglected this particular point for too long now. (Well, actually
I would have never guessed that somebody uses MidCOM with hundrets and
thousands of leaves within a single topic...) One point here will be
that there won't be any fully-automatic ACL checks through the entire
framework. Access control will still be done very selectivly, not
generally.
The main reason for this is that merging the ACL is quite time
consuming, and the tree inheritance structure we have prohibits the
check to be on a simple SQL level. So both core and component authors
should keep a very careful eye on the ACL checks they do, to keep them
at a maximum optimization level.
Also, which has to go on my todo list, there is need for an ACL
cache, so that the merging of the information does not have to be done
everytime for every access to a content object.
All this will need some careful abstraction, so that all
requirements can still be met. And, what is more imporant, it has to be
good enough so that I can integrate the requirements that will arise in
the future.
What do I want to introduce exactly?
What I have started
now is to write a set of base classes, inherited from the current set
of Midgard Schema classes. Since PHP does not support multiple
inheritance, all functionality has been put into a single class, and
the developer needs to subclass from the MgdSchema class and add all
corresponding "external" member functions. This sounds more complicated
as it is, as all classes look just about the same, with a single
difference in the class/constructor name and three or four lines of
callbacks.
If you look at the source of the MidgardArticle wrapper class:
The minor differences between each class lies solely in the name of
the class and its inheritance base class (lines 27 and 37) and at the
wrapper for the yet missing delete operation in line 224, which will be
obsolete in the very near future.
One point, which remains currently uncovered by MgdSchema, is an
intelligent parent loader. MidCOMs idea of a content hierarchy (which I
am not prepared to sacrifice) defines that each object has exactly one parent. It may have multiple links to other objects, but it must have that single parent. If we give up this constrain, the ACL inheritance will ultimately fail.
Thus, to faciliate this we have a get_parent
method, starting in line 396. It has to return the object which is the
immediate parent of the current object, or null if there is no such
object.
All other things relay to the database object base class,
which is considered what C++ calls a "friend" of the DB object and may
therefore safely access private members of that class. This essentially
allows us to use the same semantics as if we would have multiple
inheritance.
How can we effectivly maintain this?
Now this is the million dollar question. More or less at least.
Maintaining these wrapper classes by hand is certainly no option.
Especially as the basic interace (which is identical for all classes)
can change over time.
So the next idea I had was writing a simple code generator so that
the average developer can automate the task of building these wrapper
classes. Then again, I thought, this is not really a good solution as
we still don't have the kind of automatism you would have with
real-life wrapper classes. Especially when I want to introduce new
features.
Ok, then we need a Plan C (no, not from outer space).
At this point in my thoughts, I remembered the way J2EE does this.
They too face similar problems when they should implement stuff like
J2EE container managed persistance operations. The solution they
implemented was creating classes on-demand when applications were
deployed the first time.
What I find intriguing here is the fact that this keeps performance
at a tops (as it produces regular code) while not inhibiting the
flexibility of development.
So lets continue this idea of the integrated db abstraction code generator.
The general idea
So
what do we need in the end? The developer wants to have a class he can
use as usual, without much hazzle and which is derived from a MgdSchem
class like NewMidgardArticle.
That class should ideally consist only of those methods that need to be overridden on a PHP level like get_parent. All other things should be provided in some parent class instead.
In
addition, due to the fact that we do not have a full inheritance
hirarchy, we will need some way of automatically determine information
related to the class. Especially interesting here is the name of the
original MgdSchema base class, the name of the MidCOM base class that
should be used and the name of the table (currently called "realm").
This
meta information is especially important as MidCOM needs to be able to
convert to and from almost all kinds of Midgard objects that we may
encounter.
With this information, MidCOM will implicitly
generate a intermediate class, from which you in turn derive your
application level class. These classes will be explicitly bound to a
component or the MidCOM core, with a strict namespacing.
A first example
Let us look at the articles like they are now. Articles are historically stored in the table article, with the old Midgard class being named MidgardArticle and the MgdSchema class being named NewMidgardArticle. The information MidCOM will have for this class looks just about this then:
'table' => 'article'
'old_class_name' => 'MidgardArticle'
'new_class_name' => 'NewMidgardArticle'
'midcom_class_name' => 'MidCOMArticle'
With this information, we can already start off, defining an intermediate class:
class __MidCOMArticle extends NewMidgardArticle
{
// Auto-generated interface code with stubs for all callbacks
}
Note the double underline prefix, which indicates that it is an intermediate class not intended for direct usage.
The
application developer, in this case the MidCOM core team, will in turn
inherit from this class, creating the real-life instance.
class MidCOMArticle extends __MidCOMArticle
{
function MidCOMArticle($id = null)
{
// Keep the constructor chain intact.
parent::__MidCOMArticle($id);
}
// Override what you need from this point on.
}
From
this point, you can either use your class directly (in case of
component specific classes) or again inherit from the class (MyArticle extends MidCOMArticle).
Of course you can also define classes which are only found in MgdSchema. In that case you simply set the old_class_name property to null, indicating just this.
Organizational issues
A
main point that arises here is the update semantics. Obviously, these
auto-generated classes need to be cached as live-generating them is
inefficient.
For me, the most natural way would be having
definition classes looking roughly like our fist example above which
map to class file generated by midcom. If the definition file is newer
then the class file (a quick test), the class needs to be regenerated.
A
similar test can be done with the last modification time of the actual
class builder, so that classes are automatically regenerated when the
class builder changes.
The classes in question will be
located in the MidCOM cache directory, and it should also be possible
to have multiple classes defined in a single file, for ease of
management.
All classes will be loaded when the corresponding
component loads.
Depending on whether you plan to share the classes between components it is recommended to put the class definition into a shared library within MidCOM which is loaded by the corresponding components.
With some components it could make sense to keep the defined classes with a full blown component, for example if you build a component like n.n.orders which provides ways to "remote control" it.
How does MidCOM detect these "defined classes" then?
These classes will be defined in a part of the components' interface. When the component loader starts up a component, it will read the list of defined classes and invoke the class loader/builder for it. This is the same place where ACL permissions are defined by the way, so we need extensions at this point anywhere.
With the new component baseclass it should be as easy as adding something like this to the interface class constructor:
$this->_autoload_dbclasses = Array('myclassdef1', 'myclassdef2');
... where the class definitions are looked up in the config directory of the component.
These files are largely structured like MidCOM schema databases, being an Array definition. For performance reasons the class loader/builder will support multiple class definitions within a single definition file. So a full class definition file could look like this:
--- START OF FILE ---
Array
(
'table => 'article',
'old_class_name' => 'MidgardArticle',
'new_class_name' => 'NewMidgardArticle',
'midcom_class_name' => 'MidCOMArticle'
),
Array
(
'table' => 'topic',
'old_class_name' => 'MidgardTopic',
'new_class_name' => 'NewMidgardTopic',
'midcom_class_name' => 'MidCOMTopic'
)
--- END OF FILE ---
As you can see, this is essentially an array of arrays which is defined here. The main difference to the MidCOM schema databases is the fact that the arrays are not indexed explicitly.
The main advantage here lies in the faster invalidation checks. The class builder/loader will generate a single PHP source file per class definition file, it will not generate a file per class.
Summary
While this might look like overkill on first sight, there are a few points I like to repeat why I propose such a piece of code for MidCOM:
First, it will finally bring all operations within MidCOMs grasp. This is a thing for the future, used only very limited right now, but it allows me finally to influence and control every object during runtime. On the long run, this will make many things easier.
Second, it will make building transition code more easy, as you only have to modify a single place instead of over a dozen classes (perhaps more with OpenPSA 2). This is interesting for both the MgdSchema and the Multilang transitions within MidCOM.
Third, these cached classes are the only way to keep up both performance and the advantages of object inheritance. The latter aspect is mainly about the Human Factor: All points where you work with non MidCOM objects right now you have to remember to call up MidCOM hooks. This can be avoided in the future, for example making the frequent of invalidate() calls obsolete.