Saturday, December 3, 2011

Data Sharing Among Applications

Too Much Complexity - It's a big Pain In My Ass

I have this theory that most of the shit we do in IT is more complex than it needs to be  There are entire industries built around creating new vocabularies for old concepts and packaging fundamental data processing concepts into new buzzwords - almost always making simple things less simple.

This is not new, it's been common practice for at least 15 years.  I think prior to about 15-20 years ago there were too few IT pros to get away with it - we would call BS on your "new idea" if it was already commonly understood or too obvious.

Recently, I've been thinking of how to de-couple applications that have grown up organically as close siblings.  You've seen this where each application has to have its own copy of the data from other apps.  We have an institutionalized scheme for noticing changes and pushing them to all interested parties.  Or where one application is doing complex multi-table queries against the physical tables in another application's database.  Or even worse where my app consumes some of your data and changes the status of the records directly in your DB.

Many people don't seem to think this is a problem or even particularly ugly.  I might just be a whiny old guy to think something more elegant would be better - maybe so but I'm gonna whine.

Problems with close sibling applications

You're too much in my business.  I mean where's the privacy?  Where's the respect?  And now I have to get your permission to re-arrange my furniture?  I feel constrained.  I feel put upon.  We need to re-evaluate our relationship.

All this everybody's in everybody else's business mean it's really hard to make simple changes.  I might break your app.  You might break mine.  Who uses what?  What apps are at risk if I make this change?  It's all very scary and leads to paralysis, delays, and really frustrated business stakeholders that just want to get shit done.

Popular Solution Approaches

Generally you see a few approaches.  One is to decide that the entire enterprise is going to implement some sort of Enterprise Service/Data bus and become SOA everywhere.  Just a few million bucks on consulting, hardware, software, projects to create web services for all our apps, and we'll be good to go.  Three years - tops.

Another approach is to undertake a CMDB project.  We'll document all our dependencies, design a data store to put everything in and then we'll be able to see exactly what changes will affect what processes and applications.  Again it'll only cost a few million bucks and we should be able to do this in no time - two or three years.  Let's ignore for a moment that we have a poor understanding of our dependencies to somehow overcome.

For both  of these you also have to ignore that the business still wants to get shit done and they compete for the resources you're using on these "nice to have" IT initiatives.  They compete using business cases with revenue, presented by professional sales people.

In the old days we had the idea that there should be an enterprise database - a single, shared, easily accessible database that everybody would use.  We would define every data element in the data dictionary and if your app needed to have last names, you would go to the data dictionary to find out that last names are always varchar(25) or whatever.  Of course this was a complete failure, with data dictionaries having at least as many definitions for a standard last name as there were existing applications that use last name.  The idea is still attractive - but it just does not work in the real world.

My opinion is that these approaches are too complex to undertake as a project and are doomed to failure - I would love to hear some success stories though.

What we can start doing now

I'm also a big fan of Fundamentals - setting a pick, getting down on the ball, looking where you want to go - before advanced techniques like driving to the basket, turning a double play, and getting a knee down.

So I suggest starting with something simple.  One or two apps at a time, find out what data they share and how. 

If they're doing queries directly against your tables - move that to a view that also does whatever joins are required to give them the data set they need.  Think about things like - my app looks for a record in one table and if it's not found, queries a history table for the same info - could you create a view that allows me to look just once?

What if I need to change values in your data?  First explore whether that's really needed, if it is, create a stored procedure that you control and I just call.  Then you are free to do whatever you need in your app as long as my view and update procedure stay the same - you don't even need to talk to me about it...

After this first step of creating Views and Procedures we can start thinking about creating endpoints.  Web Services, an intermediate Database server that Federates access without replicating data (use a data warehouse to Federate big data), or whatever other endpoint you can think of.

Views?  That's so 1990 - we are SOA all the way XML, HTML...

Views are an enabler that support whatever you want to do next while providing immediate benefits - not the least of which is figuring out what data you actually share.

Views might be 1990 but just about everybody can understand a select statement.  And think about the benefits to testing.  Rather than needing a full blown environment with all the services up and running.  Or mocking all those services.  I can have test data in a Spreadsheet or Access or MySQL or whatever I want and simply point the app to the datasource.  I can also peruse the data that my app sees without having to know all about the schema in the source DB.

Conclusion

It's our job to resist gratuitous complexity.  Go to the fundamentals that enable whatever is next and only add moving pieces that can carry their weight and more.

Thanks for reading - Mike