Pinderkent

Pain and glory from the trenches of the IT world.

The software impact were IBM to acquire Sun Microsystems.

Posted on Wednesday, March 18, 2009 at 11:36 AM.

There have been reports recently suggesting that IBM may soon acquire Sun Microsystems. For those of us in IT, acquisitions of these types are always a big deal. While Sun's server hardware is widely deployed and of great importance to many, my interest is mainly in software, so that's the area I'll focus on in this article.

In the software world, Sun is most well known for Java and Solaris. I suspect that we'll see two very different outcomes for these technologies if an acquisition does occur. It's difficult to say what will happen for some of the smaller or more obscure software products and projects that Sun was involved with, such as OpenOffice.org and NetBeans.

In terms of Java, I don't think we'd see much, if any, negative change. IBM has embraced Java, and understands its use, at least for business users. For some time now they've provided their own JDKs for a number of platforms. Java also plays a significant role in their prominent WebSphere Application Server. They're actively involved with developing new Java-based technologies. And their involvement and contributions to Eclipse have had a huge impact within the development community.

In some respects, such an acquisition may be quite good for Java. Although known for generally being more conservative, IBM does have the resources needed to make the changes to the Java language and platform that are necessary for it to compete better with Microsoft's .NET platform. We need to see half-baked technologies like JavaFX and JavaFX Script discarded. But we also need to see functional or object-functional languages like Clojure and Scala perhaps brought in as core components of the platform.

If IBM put their weight behind such technologies, we very well may see more developers be willing to adopt them. As we move into a world with massively multi-core CPUs, we will need to make use of functional programming techniques to write scalable software that effectively uses such hardware. Given IBM's (and Sun's) emphasis on high-end server hardware, a willingness to adopt and support Scala and Clojure could really put them in the lead in this area.

Things don't look as good in terms of operating systems. As basically everyone in the industry knows, Sun has offered Solaris for years, while IBM has offered AIX. While both high-end UNIX-based operating systems, I'm not certain if they could be successfully integrated. At a technical level, I suspect they are just too different.

If an acquisition were to take place, I imagine that Solaris would be supported for some time, but eventually deprecated in favor of AIX. This would be similar to what happened to Tru64 UNIX when HP and Compaq merged. If I recall correctly, HP originally planned to transition Tru64's more advanced features to HP-UX, but this didn't end up happening, for the most part. Now Tru64 is essentially on its last legs.

Given that Sun has released much of the Solaris source code over the past few years in the form of the OpenSolaris project, it seems likely that it will live on in at least some form. A truly self-sustaining community, akin to the Ubuntu project for Linux never seemed to develop, however. So it's difficult to say how much, if any, innovation we'd see out of the OpenSolaris project were Sun no longer supporting it.

Another interesting area to consider is that of the MySQL-related technologies and business that fell into Sun's fold with their acquisition of MySQL AB at the beginning of 2008. Given IBM's pivotal role in the development of relational databases, and their offering of the very solid and professional DB2 family of database products, MySQL's future in such an organization seems quite limited. Even within the open source community, MySQL is generally considered an inferior RDBMS. I really can't see it having much of a future, especially with Marten Mickos and Monty Widenius out of the picture.

We'll just have to wait and see what comes out of these rumors. But in terms of the software world, such an acquisition could have some significant impacts. They'd mainly be felt by those working with and developing enterprise systems, but may still be noticed by others, including, for example, OpenOffice.org users. So I see such an acquisition as potentially being good for Java, but less so for the Solaris, and potentially disastrous for MySQL.

Permalink: http://pinderkent.phumblog.com/post/2009/03/the_software_impact_were_ibm_to_acquire_sun_microsystems
Share:

Mistakes are prevalent within PHP- and MySQL-based software systems.

Posted on Saturday, March 14, 2009 at 6:13 PM.

There was recently a posting at the The Daily WTF site entitled The Quest for the Unique ID. It gives an example of a software system that generated unique invoice identifiers by randomly generating a value, checking if that identifier had already been used by an existing database record, and repeating until an unused value was found.

Some people may laugh and doubt that software like this exists, but after years in the industry, one sees enough mistakes of that type to know they are a very real problem. No matter what platforms or technologies are used, software will be written incorrectly. Usually this is unintentional, and due to unclear specifications, typing mistakes, misunderstandings, and so forth. But there are other problems, like that of the The Daily WTF article, that go far beyond being bugs.

Almost any software developer, regardless of training or experience, should be able to see the obvious problems with the approach that was described in the article. Unfortunately, there are some software developers who do not, for whatever reason. It's difficult to even consider such people as "developers" or "programmers", because they lack even the most basic of knowledge of the craft. Having worked with a very wide variety of software systems, platforms, programming languages, and database systems, I have to say that I've seen most of these types of mistakes in software developed using PHP and MySQL.

I suspect this has to do with the general attitude within both of those communities. Namely, they have focused on quickly developing software, all without putting much emphasis on quality, security, and reliability. Both implementations, for instance, have a long history of having poor performance, numerous security flaws, numerous bugs (that often aren't fixed for years), poor architecture decisions, and a lack of critical features. Much of the software that is written in PHP and uses MySQL tends to absorb these negative traits, and then somehow manages to amplify them into the creation of spectacularly horrible software systems like that of the The Daily WTF article.

Far too often I've been at the bookstore and seen books that promise to teach both PHP development and MySQL development in just a few hundred pages. Unfortunately, such books appeal to those without much, if any, software development experience. And once they have read one such book, they come away mistakenly believing that they're on par with professionals who have spent years studying database and software development. Soon enough they've convinced somebody to hire them, and soon after a business software system has been developed that generates unique identifiers by random trial-and-error, avoids the use of primary keys, avoids the use of foreign keys and other constraints, is vulnerable to SQL injection attacks, performs extremely poorly, and is full of bugs.

Over time, I have become more and more hesitant towards taking consulting jobs that involve PHP and/or MySQL. The systems we see are often so broken that there is little that can be salvaged. From the database model to the highest levels of the UI, it's not a matter of fixing minor bugs or architectural deficiencies. Most of the time, the entire system is in dire need of replacement. Unfortunately, this often proves to be a process that most clients cannot afford nor justify. But perhaps if things had been done more correctly in the first place, they wouldn't be in such a bad position. Thus the lesson we can usually take away is that PHP and MySQL should be avoided whenever possible.

Permalink: http://pinderkent.phumblog.com/post/2009/03/mistakes_are_prevalent_within_php_and_mysqlbased_software_systems
Share:

Frameworks causing unnecessary complexity in enterprise software systems.

Posted on Sunday, February 22, 2009 at 5:04 PM.

A few weeks ago, Travis Jensen posted an article about the low quality of many enterprise software systems. For those of us who have worked in such environments before, we are well aware of what he is saying. The quality of such software is often quite terrible. And he does nail a big portion of the problem: complexity.

In some domains, there is inherent complexity. We may be working with huge amounts of data, we may have to process many requests simultaneously, we may have to deal with ambiguous or unclear business rules, and so forth. But many projects also introduce unnecessary complexity at the technical level. This seems to be what Travis was focusing on, as he discusses the unnecessarily large software stack he has to work with daily.

We often see these unruly software stacks in Java-based systems. The Java development community has always been far more in favor of frameworks than many other development communities, even when working on implementing similar software systems. And even within the different types of frameworks, we often see three or more competing projects. Given that a huge amount of enterprise-grade software is developed using Java, it's no surprise that we're likely to run into systems making use of several such frameworks.

While these frameworks are often useful in smaller applications that need to be developed very quickly by a small team, they typically lose their benefit in the larger, more complex and longer-term projects we see in the enterprise. Aside from the increased complexity and the debugging difficulties they bring, as mentioned by Travis in his article, the effort necessary to integrate these frameworks often exceeds the benefit they initially bring to the table.

At some companies I've worked with, it was almost seen as a crime within the development team to write any significant amount of code for a project. The emphasis was completely on letting the framework do the work, with the developers only writing light integration code. As anyone who has worked with such frameworks knows, this often isn't possible, due to performance reasons, unusual requirements, or even the deployment environment.

I've witnessed some of these teams spent excessive amounts of time fighting to integrate the various third-party frameworks of their software stack. The code they write to integrate these components ends up being more than what they would've wrote if they'd just rolled it all by hand. And like Travis highlighted in his article, debugging can become a real nightmare. Tracking the original source of a single field displayed on the UI can become a tedious journey through Web template frameworks, the code integrating that with an underlying business layer, through to the code making up a data layer or an ORM framework, and sometimes down to hand-written SQL code. Simple bug fixes can indeed become very time-consuming tasks.

Some of those development groups have moved towards reducing their use of frameworks. If this approach is taken with an existing application, we can end up with a real mess. Some portions of the system will use some ORM or data persistence layer, while others directly invoke SQL code or stored procedures. This takes what may have already been a confusing situation, and makes it far more convoluted.

There's probably no easy solution to these problems. We may just need to accept that for larger and more complex software systems, it's better to invest in developing much of the system from scratch. While frameworks may give an initial advantage, for longer-term projects they become a nuisance, and even an additional problem to work around when modifying the system or adding new functionality. And given that many of these frameworks manage to work their way into the overall architecture in a manner that does not easily allow them to be removed later, it may be best just to not get involved with them in the first place.

Permalink: http://pinderkent.phumblog.com/post/2009/02/frameworks_causing_unnecessary_complexity_in_enterprise_software_systems
Share:

It's rare to see a database schema as clean as MediaWiki's.

Posted on Saturday, January 24, 2009 at 3:36 PM.

Over the past week or so, I've seen links to this diagram of the MediaWiki database schema posted on a number of other sites and blogs. Now, MediaWiki is no minor piece of software. It is used to power Wikipedia, which Alexa currently states is one of the top ten most popular Web sites on the Internet.

At many of the companies I have consulted with or worked for, it is rare to find a document or diagram as precise and effective as the one for the MediaWiki database schema. It's even rarer to find a real-world database schema that is so consistent and sensible. Clearly, a great amount of care and experience has gone into the development of this database.

Even without studying it for very long, it is apparent that it is a rather clean database schema. One thing that is apparently almost immediately is the prefixing of the table columns. This is a technique often used to reduce the ambiguity between identically-named columns of separate tables when those tables are joined. It also helps make it more clear from which table a given column is coming from when looking at the results of a query.

The one minor inconsistency I do see is that the names of some tables contain words that are separated with an underscore (for example, user_groups, user_newtalk, page_restrictions and site_stats), while others do not separate the words (for example, filearchive, oldimage, pagelinks, searchindex, and recentchanges). At least the tables related to certain functionality tend to have similar naming conventions.

The normalization and the relationships between the tables appear to be quite sensible. The short descriptions, even if only a sentence or two long, do significantly help make it clear why a certain table exists. This is something that is lacking from many enterprise databases.

Performance is clearly an issue, so we see some caching-related tables in the top-right corner of the current database schema diagram. We see other tables, like job, testitem and testrun, that show that this is a real-world database where we have external batch jobs, as well as testing. This isn't just some academic exercise dreamed up in a classroom where the necessities of the real world are not to be found.

Any piece of software backed by a database should strive to have a similar schema diagram. And it needs to be kept up-to-date as changes are made to the database. This single schema diagram teaches us much about the MediaWiki architecture, but it also speaks to the quality of the product. Software that is built upon a sensible data model often proves to be very natural, and thus in many cases easier to develop, which often leads to a higher-quality product. If a production database schema can be kept as clean as that of MediaWiki, then we'll likely see greater programmer productivity, and higher software quality.

Permalink: http://pinderkent.phumblog.com/post/2009/01/its_rare_to_see_a_database_schema_as_clean_as_mediawikis
Share:

Why not just write the SQL code?

Posted on Saturday, January 24, 2009 at 2:28 AM.

Lately, there has been a trend towards taking a high-level language, and using it to write SQL-like queries. In the .NET world, the best known example is LINQ. Today I saw a similar, albeit much more simple, approach for Python.

These approaches differ from many ORM systems in that we're still thinking in terms of raw queries, and not as much in terms of objects and the relationships between them, and how the data maps to tables and the relationships map to constraints. Likewise, they differ from SQL embedded within another language, in that we don't actually write much, if any, SQL code.

But after looking at the example Python code, I'm not sure I understand how it is better than just writing the SQL code in the first place. We're still making use of most of the same concepts as in SQL. The general form of the query is the same. We're selecting rows containing certain columns from joined tables or subqueries, subject to certain filters, and sorted in a particular order. There's nothing remarkable happening there.

If anything, we end up writing queries that are nearly identical to the equivalent SQL, but instead using a syntax that is more awkward to work with that SQL is. And we don't gain any extra functionality or abilities in exchange for this loss of developer productivity. We don't benefit in any way by avoiding SQL code, and instead just writing Python code that is virtually identical to that SQL code, but with a more awkward syntax.

Most decent programming languages offer some way to easily interface with a wide variety of database servers, typically also offering easy ways to parameterize the SQL code passed to the database to increase the level of security. So for the vast, vast majority of database queries, we should be able to put the SQL code inline, parameterize it, and pass in the parameter values when we execute the query. We should never have to resort to string concatenation, or anything of that sort. We just write pure SQL, within a string in the host language. It's as simple as that. There is absolutely no need for the SQL-like adaptations of the syntaxes of non-SQL programming languages.

Permalink: http://pinderkent.phumblog.com/post/2009/01/why_not_just_write_the_sql_code
Share:
Feeds
  • RSS 2.0 Feed
  • Atom 2.0 Feed
Tags
Archives