Bas Geertsema.net

SaaS and the need for dynamic languages

by Bas 25. August 2009 13:38

A key distinction between SaaS and non-SaaS is the housing of multiple-tenants in a single instance. This required mass-customization techniques. Most of the customization, or variability, has to be defined at runtime; (re)deploying of the instance should be kept to a minimum! As an effect, the software system engineer has to design the application with support for runtime variability. For more advanced customization scenarios such as business logic and workflows, you will need a (turing-complete) programming language to support this. Statically typed languages are in a serious disadvantage compared to dynamic (scripting) languages as the former requires compilation, building and linking which is not near as easy a simple interpreting engine.

My guess is that statically typed languages will continue to be used for the instance itself, the core application. Building on top of that, all user customizations, or user applications, will mostly be developed using dynamic languages such as javascript or python as they are much easier to work with at runtime compared to the current generation of statically typed languages such as Java and C#.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Computer | Professional | English

Multi-tenancy.. do you really need structured data?

by Bas 18. August 2009 15:26

One of the big architectural decisions you have to make in multi-tenant software is how you store and partition your data. In many cases you will choose for a RDMBS such as Sql Server, MySQL or Oracle. The choice for an RDMBS is well founded; you get powerful querying, transactions, recovery, backups, indexing and management capabilities. Sure, a RDMBS might not easily scale-out but by scaling up you can come a long way.

Once you go the RDBMS route you have to figure out a database schema that will suit your application. Of course it has to support multi-tenancy. And since you have only a single instance of your application your application must take care of dealing with querying the correct data. In your database design you basically have three options:

- A single database for each tenant

- A shared database, multiple-schema for each tenant

- A shared database, shared schema

This all has been well explained in this MSDN document, and that is not where I want to focus at. Instead, I want to share some design struggles I had with this.

I recently have been quite busy figuring out which path to take. The difficult notion here is how to deal with tenant-specific customizations. For example, different tenants might have the same business entity extended with different attributes. This does not align with a relation server, which only supports a fixed database schema. So you either have to design a very generic and flexible schema, in which case all variability is handled by your application layer. But this tends to lead to awkward and inflexible querying and an unclear schema with names like attribute1, attribute2, attribute3, and a lot of meta-data. The second option is to modify the schema at runtime, which is obviously only possible if each tenant has its own schema or database, a shared schema is not possible. The runtime modification of database schemas seemed like a sensible approach. Just use DDL for schema modifications and introspection and some application layer doing the translation work. But this quickly turns out quite complex and error-prone. For example, what to do with schema updates in your application? And how to upgrade existing tenant data to a revised schema?

Dealing with these problems I figured that the real problem might be that I am locked in this ‘ it has to fit in SQL’ mindset. When I came to think of it, a lot of these customer extensions just end up in some forms or reports. They do not interfere with the core functionality of your application. Why, then, should you store this data in a very structured way which causes all the associated hassle? Why not use, let’s say, the semi-structured XML data column in SQL server to store all this tenant-specific, and possibly ambiguous, data? The world-wide web is pretty much semi-structured and ambiguous, but it works pretty well in the end, doesn’t it? And if it works for the web, why should it not work for me?

My current approach therefore is to have an explicit distinction between the structured and semi-structured data I deal with in my application. As it turned out, the structured data is very fixed among the different tenants. This allows me to adopt a shared database, shared schema approach. This might, or might not be the best way to go security-wise (tenants should never ever be able to see each others data), but at least I do have the option. This distinction also leads to a much more elegant and robust design with less complexity.

So, the next time you are struggling with a db schema, rethink it over; does it really need to be structured? Should it really be indexable and queryable? Or does it allow for a semi-structured data approach? The choice for less rigidity and more flexibility might save you a lot of troubles down the road.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

English | Computer | Professional

A developers’ perspective on SaaS

by Bas 12. June 2009 11:03

A contemporary buzz word in software development circles is without a doubt ‘SaaS’ or Software as a Service. The dust that twirled by the introduction of ‘SOA’ (Service Oriented Architecture) has just settled down a bit and yet here comes another windstorm. What SaaS precisely entails is certainly not definite, and maybe never will before it’s outdated by yet another acronym, but I will leave that aside for a moment. A lot of people consider a pure SaaS as a software system that can be scaled enormously with high efficiency and low costs. Archetypical SaaS software products are GMail and Salesforce.com. The services that GMail offers, managing your e-mail, is a fairly standard one. Everybody more or less uses e-mail the same way. That makes it an ideal domain for a single-instance, scalable SaaS solution. But how can this be done for, let’s say, a typical ERP implementation? Surely, ERP products are often highly customized towards their users. This customization can provide that competitive advantage for businesses. Forcing every organization into a single ERP implementation is a no-go.

The benefits of economies of scale and scope, in this view, can only be achieved by having a single instance of your software that houses all of your ‘tentants’. Depending on the context, tenants are customers, clients or users. This is in contrast with a typical ASP that may have a single instance for each and every client, thereby having to deal with a lot of costly overhead for managing all these instances.

To still reap the benefits of economies of scale and scope in a single-instance, multi-tenant, SaaS application the software itself is required to offer this customization to their tenants. And this is in fact what Salesforce.com or any other more complex SaaS solution offers. They provide the tenant tools and techniques to add extra data fields to entities, provide custom business workflows, etc. The result is that a SaaS solution can no longer be seen as a single software application, but is also a (high-level) business platform on which their users can ‘build’ their businesses.

And that is where model-driven development might play an important role. Let the business user construct their model, and your application is the runtime machine on which these models are executed. This requires a paradigm shift for most software developers that consider their constructed software as an end-of-the-line shrink-wrapped application. Instead, a SaaS developer will construct the runtime-machine on which businesses (tenants) can develop their own solutions. Your software will become just another layer in the stack.

image

For the typical SaaS programmer this means a shift in focus on not delivering end-products, but rather on creating a more or less generic application platform on which business can build their own applications. Ofcourse, how generic exactly is dependent on the goal of your business. Typically, the more specific your application platform will be, the easier it will be to construct it, but the more constrained the applications will be that can be build on top of your application platform. This is as much a business decision as it is a technical one.

I am currently researching the use of model-driven development and generative programming for a paper. I plan on writing more on these techniques related to the thoughts behind SaaS on my blog as well. As I am at the moment in the middle of re-engineering an existing web-application to be in the spirit of SaaS, I will also be writing about the technical challenges and hurdles that one has to take.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Computer | Professional | English

Application tiers as a supply chain; empower the interface tier!

by bas 28. February 2008 11:42

I have been playing around with LINQ the last couple of weeks. Really nice stuff to work with, it changes the way you work with sets of data and the process of transforming them into other sets of data. But I don't want to focus on the utility of LINQ here, but on the underlying paradigm shift that we might encounter and that I have been thinking about lately: drawing the analogy between application tiers and a supply chain, and giving the power back to the interface tier.

Typically it is considered good practice to split up your application in multiple tiers. This will enhance your reusability, flexibility and scalability. All good qualities to strive for obviously. But it comes at a cost: all tiers combined will be larger in code size and more complex to enable strict seperation of concerns. For example, we all have created applications in which in the eventhandler of a button you directly prepared a SQL statement query string and sended it directly to the database server at hand. Not a good practice in many ways, but very straightforward and effective. Now consider you introduce a method where the SQL preparation and subsequent database fetching takes place. This method will be more flexible and can be used from multiple locations (maybe two or three other button click eventhandlers) in your application code. The multi purpose versatility of the method is reflected in the method itself by adding some if or switch statements. A bit more complex, but no real problem here.

We take the next step and really seperate the interface and the data access layer. So we setup a seperate project which only concerns itself with fetching and updating the data in the database. It is unaware of the front-end and therefore creates many abstractions that will be used by the front-end. This might lead to some redundancy. For example, it might check data that is already checked in the front-end, or it might make calls to the database to retrieve data that was already retrieved before, but of which the data layer was unaware. This might be a slightly bigger problem, the performance hit here to re-fetch data from the database can be considerable. But, CPU and memory is cheap, so you don't really need to worry about it these days.

The applications grows bigger and bigger, the decision is made to implement some kind of middleware, or a business layer at the boundary between your interface tier and your data tier. This middle tier must be quite flexible, because it abstracts away both the data retrieval and the interface functionality. Since there is little knowledge of this available, the middle tier can do nothing else than just work at the level of the lowest common denominator: whole business entities, perform all validations, all authentication, authorization, etc. Good practice in itself because it is the only way you can reliably say that your business logic and integrity will be executed. But again you might face the problem of even more redundancy of functionality in the interface tier, the business logic tier and your data access tier.

But is this really necessary?! The data access layers and business layers are often designed with this question in mind: 'which information do we supply?'. But should the emphasis instead not be on the question 'what information does the interface want?'. Do we push the data? Or do we explicitly pull the data?

For example, in our original application we decided to fetch the name field of an entity (e.g. a user) using in-line sql code. We ask for the name, and that is what we get. Nothing more, nothing less. Now consider the 3-tier application. We ask for the name, but there is no such method in the middleware so we retrieve the whole entity instead. The middleware decides to check whether we have the proper authorization to fetch the entity. And then whether we have the authoriziation to read the name field. The data access layer is unaware of these validity checks and, just to make sure, performs this validity again. The middle tier is also aware of the whole entity hierarchy so for convenience it implicitly figures out the exact (derived) type of the entity, even thought the name field we are interested in is only specified in the root entity. Maybe a bit exagerated, but I hope you get my point: there is a lot of unnecessary overhead involved. Surely we needed all this functionality when we wanted to retrieve the whole entity, using polymorphic fetches, etc. But we do not need it in this case. We just want the name of the entity. Ofcourse proper validation and authorization is needed, but only the kind that is directly related to the data we want. We don't want any data we don't need, and surely no functionality acting upon that unrequired data.

Now consider the three tiers working together as a typical supply chain seen in the retail industry. Within the field of operations research, the effect of unnecessary provided resources compared to the original demand is well-known as the bullwhip effect. This effect occurs when every manufacturer in the supply chain adds a 'safety' margin to the original amount of requested products. This accumulates up the chain and ends up in supply orders that overestimate the original demand. So even though the end customer maybe only have requested ten products, by the time the order reaches the raw materials manufacturer the order has grown to create a potential of twenty products. You can clearly see the problem here in the waste of resources due to the mismatch between supply and actual demand. The solution in this field of research has been the focus on demand-driven supply, or to put it different, the original demand by the customer must stay intact as long as possible upstream in the supply chain. This makes sure the supply is most likely to be equal to the actual demand.

I see analogies in the software systems we are building. But we may even have a bigger problem. Whereas in the retail industry the focus has been on the customer instead of the supplier for a while now, the software industry still considers the 'customer' (in this case the interface tier or the actual user) as just an executor of supplied services and information. Not as the creator of demand of specific services and information. Who is in charge of your information retrieval?!  Is is truly the business layer that can decide which information to supply? Does the interface tier merely makes convenient use of all the entities the middle ware provides? Or is it the interface tier that specifies exactly which data it is interested in? I believe it should be the latter.

There might be some improvements to be made on this topic. Yes, it is bad practice to directly access the database from your button click eventhandler. But this does not mean it is equally wrong to specify what you want in a query format and hand it over the other tiers. E.g. if I would write this in my eventhandler, would it be really wrong?

SELECT Name FROM User where User.Id=123

Is this not exactly what it needs?! How can it be more terse and self-explaning? Still, most programmers would shrug at just the sight of seeing code like this in interface code. But what if I could just hand this query over the middle tier for further processing. The middle tier would then add some business logic and validation logic in this query definition and hand it over to the data tier. This tier would in turn add some database specific filters to the query and execute it. The resultant data (a single string representing the name of the entity) would be returned to the middle tier and back to the interface tier. The amount of overhead is minimal, since the original request of information stays intact all the way up to the data tier. There is nothing more and nothing less we need.

To come back to the technology I started with, I believe LINQ is a path to enable a strategy like this. The deferred execution makes is possible to adjust the query definition in all the tiers before it is being executed. Thereby making it possible to inject the query with validation logic and authorization logic. Or to inspect the query just before executing it, instead of making assumptions about the final use of the supplied data.

Empower your interface, it knows best, but make sure your data and business layers are your law-enforcement troops.

Currently rated 4.0 by 1 people

  • Currently 4/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Professional

Social Graphs and the semantic web

by bas 5. February 2008 14:27

At Google they have created an API to look up a social graph. I found it particularly interesting to come across this, since I had been thinking about the same concept for some while. I even wrote something about it a while ago. The driving idea behind it is that we have many fragmented online identities. But, ofcourse, there is only one real identity, that is the body you're carrying around all the time with you. It makes sense to somehow connect these online identities and identify the only single real identity. Now with the immense information base available at google it should be possible to create some smart page indexing and extract these information. But this is not what the social graph does: it makes use of the XFN standard to extract high-quality and consistent information. An example of an XFN compatible tag would be:

<a href="http://tanya.example.org" rel="friend met colleague">...

In this example you do not only specify the url to which the hyperlink points, but also the relation. It drives the forming of a semantic web, where not only pages and resources are linked together, but also their relations are described. Very nice stuff, but at the same time also vulnerable for abuse and privacy infringement. Use it wisely.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Professional

XBAP does not deliver as a smart client

by bas 11. January 2008 22:43

Recently I decided to investigate more in real smart clients, partly because I am just fed up with all these DHTML / AJAX workarounds. Despite how wonderfull the supporting libraries are these days, it is still in no ways comparable to a decent desktop application. Fortunately I am developing an online application in which we can get away by demanding minimal system requirements at the user's site. Such as the use of Windows XP / Vista and the use of IE7. In this fortunate position I am investigating the use of WPF applications to create a superb interactive interface. I started out with XBAP which stands for XAML Browser Application and in short this means that XBAP applications can run inside webpages (for example within an IFRAME) and you can almost use  the full WPF feature set. Unlike Silverlight, which at the moment only provides a small subset of WPF and they are still busy implementing the first primitive controls. There is a catch however, the clients needs to work on a modern Windows platform and using IE6 or IE7 and must have the .NET framework 3.0 or higher installed. In my case this was no showstopper, so I continued investigating XBAP and try to create some test applications.

What turned out to be the showstopper was the security sandbox in which XBAP applications run. By default your XBAP applications work in Partial Trust, which is severy limited. I had problems setting cookies, accessing webresources (even from the same host?) and was not able to perform any .NET Remoting due to security restrictions, although I was able to use some WCF webservices. Nevertheless, it took me some time to figure all things out and I noticed there is very, very little information about XBAP on the internet. Most of which are not positive at all. It turned out that XBAP with it's sandbox is just too restricted to be meaningful as a smart client platform.

Just as I was about to give up, I turned my eye to developing regular WPF (desktop) applications and using Click-Once for online deployment. ClickOnce supports automatic downloads, automatic update and no local install. In short it takes care of all the hasle of software deployment, which is one of the main reasons web applications are so popular. After the initial download of your application the user is prompted with a screen to accept the installation and from that point on your application can run freely with Full-Trust and it is the smart client you really want it to be.

.. and that is ofcourse the reason XBAP has been around for a year and a half (as far as blogs and forum posts go) but is still not really used. Even though Microsoft markets XBAP as some kind of smart client toolset, it just isn't. For a real smart client you need out of that partial trust and use a regular WPF application with click-once deployment. And for some cool graphics with no real functionality you might as well turn to Adobe Flash (widely supported) or even Microsoft Silverlight (small download and available on multiple platforms).

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Professional

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen

Search