Pinderkent

Pain and glory from the trenches of the IT world.

A more generalized replacement for NoScript's "surrogate scripts" functionality.

Posted on Sunday, January 25, 2009 at 11:18 PM.

I saw an article today about the "surrogate scripts" functionality of the NoScript extension for Mozilla-based Web browsers. It essentially allows for a user to provide a replacement for a blocked script that some Web site requires to function properly.

I worked with a Web developer once who took a similar, although more general, approach to this problem. To avoid seeing ads and other unwanted content, he used his /etc/hosts file to resolve a number of such domain names to localhost. But some of the sites he visited often did make the mistake of having scripts that relied upon a function or variable defined in some of the scripts he had blocked. Some of the functionality of these sites was thus unavailable.

As a Web developer, he often switched back and forth between browsers, and often just used the last one he had open after testing work he had done for a client. So he wanted to avoid a browser-specific plugin. That's why he used his /etc/hosts file to block these sites in the first place.

His solution to this problem was to simply set up a Web server locally, and he manually created local JavaScript scripts that stubbed out the minimal functionality needed for the problematic Web sites to work. So regardless of the browser he was using, any requests for the blocked hosts were sent to his local Web server instad, and his minimal version of the script was served up instead. This resolved the problem with the missing functions or variables, prevented the Web sites' JavaScript code from crashing, and allowed him to use those sites.

I'd asked him if he ever ran into problems with two separate blocked hosts that both used the same JavaScript script filename, but the code itself was different. He said it wasn't something he'd ever encountered. If it ever did arise, he said he'd just include both script stubs in the same local file, and chances are there wouldn't be a conflict.

Permalink: http://pinderkent.phumblog.com/post/2009/01/a_more_generalized_replacement_for_noscripts_surrogate_scripts_functionality
Share:

It's rare to see a database schema as clean as MediaWiki's.

Posted on Saturday, January 24, 2009 at 3:36 PM.

Over the past week or so, I've seen links to this diagram of the MediaWiki database schema posted on a number of other sites and blogs. Now, MediaWiki is no minor piece of software. It is used to power Wikipedia, which Alexa currently states is one of the top ten most popular Web sites on the Internet.

At many of the companies I have consulted with or worked for, it is rare to find a document or diagram as precise and effective as the one for the MediaWiki database schema. It's even rarer to find a real-world database schema that is so consistent and sensible. Clearly, a great amount of care and experience has gone into the development of this database.

Even without studying it for very long, it is apparent that it is a rather clean database schema. One thing that is apparently almost immediately is the prefixing of the table columns. This is a technique often used to reduce the ambiguity between identically-named columns of separate tables when those tables are joined. It also helps make it more clear from which table a given column is coming from when looking at the results of a query.

The one minor inconsistency I do see is that the names of some tables contain words that are separated with an underscore (for example, user_groups, user_newtalk, page_restrictions and site_stats), while others do not separate the words (for example, filearchive, oldimage, pagelinks, searchindex, and recentchanges). At least the tables related to certain functionality tend to have similar naming conventions.

The normalization and the relationships between the tables appear to be quite sensible. The short descriptions, even if only a sentence or two long, do significantly help make it clear why a certain table exists. This is something that is lacking from many enterprise databases.

Performance is clearly an issue, so we see some caching-related tables in the top-right corner of the current database schema diagram. We see other tables, like job, testitem and testrun, that show that this is a real-world database where we have external batch jobs, as well as testing. This isn't just some academic exercise dreamed up in a classroom where the necessities of the real world are not to be found.

Any piece of software backed by a database should strive to have a similar schema diagram. And it needs to be kept up-to-date as changes are made to the database. This single schema diagram teaches us much about the MediaWiki architecture, but it also speaks to the quality of the product. Software that is built upon a sensible data model often proves to be very natural, and thus in many cases easier to develop, which often leads to a higher-quality product. If a production database schema can be kept as clean as that of MediaWiki, then we'll likely see greater programmer productivity, and higher software quality.

Permalink: http://pinderkent.phumblog.com/post/2009/01/its_rare_to_see_a_database_schema_as_clean_as_mediawikis
Share:

I still dislike JavaScript, and likely always will. It has some pretty fundamental flaws.

Posted on Saturday, January 24, 2009 at 2:56 PM.

There was a recent thread of discussion at Reddit asking who initially disliked JavaScript, but "later realized it's actually a pretty cool and very unique language." I must say, I disliked JavaScript the first time I had to work with it in the mid-1990s, and having used it extensively since then, I still have to admit that I dislike it.

The only interesting thing about JavaScript is its support for prototype-based programming. But in this regard, it is neither unique nor is it particularly innovative. Much of that credit would need to go to Self, which predates JavaScript by nearly a decade. JavaScript has borrowed ideas that other pioneered, but unfortunately many programmers aren't aware of that, and are mislead into thinking that JavaScript is the original source.

Aside from that, JavaScript doesn't offer anything spectacular. For instance, functional programming languages had been offering first-class functions for decades before JavaScript was developed. Likewise, its syntax is clearly based on that of C, which itself has been around for quite a while. Its regex support is unremarkable. And typically being embedded within a Web browser is more of a hindrance for it than anything else.

JavaScript does offer many drawbacks. One of the most significant is that it is a dynamically-typed language. Some people mistakenly believe that this is a benefit. They say it allows them to develop their code faster. And in a sense, that's true. They may very well produce more code, but that's just because there's no compiler there to point out their numerous coding mistakes, and to force them to fix the mistakes before they can run their code. So they code they produce is heavily flawed, and we often end up with Web sites in production that have numerous problems that are only detected at runtime. Unfortunately for all involved, it's usually customers and clients who detect these problems.

Another major problem with JavaScript is its performance. It probably doesn't help that most implementations are browser-based. Web browsers themselves aren't often known for being fast pieces of software. So it's not unexpected that a scripting language running within the browser is lacking in performance, as well. And it's only now, after nearly 15 years, that's we're beginning to see better-performing JavaScript engines like Tamarin and V8 become available. That's an awfully long time in the world of computing.

Within the Reddit thread, many people said that their favorite things about JavaScript were libraries like jQuery, Prototype, and Ext JS. First of all, those have little to do with the JavaScript language itself. And they're not especially innovative, either. All they're doing is patching up the numerous flaws that make JavaScript-based and Web browser DOM-based development so awkward in the first place. They bring only an illusion of productivity, rather than actual productivity, because JavaScript-based development in the browser was so terrible and unhealthy to begin with.

As we can see, JavaScript can generally be considered a technological failure. That doesn't mean that it isn't widely used; it clearly is! But we should not mistakenly consider it to be an innovative nor a well-designed programming language, for instance. Of the people I know who like JavaScript, most of them are very ignorant about other programming languages and environments. JavaScript only seems acceptable because they don't realize how flawed it is in so many critical ways.

Permalink: http://pinderkent.phumblog.com/post/2009/01/i_still_dislike_javascript_and_likely_always_will_it_has_some_pretty_fundamental_flaws
Share:

Mapping concepts from one programming language to another.

Posted on Saturday, January 24, 2009 at 2:53 AM.

I read through an article today that suggested an idea for a Web site where a user can specify a task in a programming language they know, and the site tells them how to perform it in some other language.

This is an interesting idea, and no doubt could make for a useful site in some cases, but would likely run into problems creating the relationships between languages. Even something as crucial as string handling can differ significantly between different languages. C treats strings as arrays of characters. C++ treats strings similarly to C, but also has for example the std::string class for representing strings. Java treats strings as an object. Erlang treats strings as lists of integers.

So operations that are available in one language may not really have an immediate equivalent in another language. Likewise, the approach taken when using one language may be completely inappropriate, in terms of performance or memory usage, when using another language. Such a Web site would need to ensure that these differences and the potential risks were clearly stated. But otherwise, I think it might actually be useful, even if the article suggesting it was somewhat in jest.

Permalink: http://pinderkent.phumblog.com/post/2009/01/mapping_concepts_from_one_programming_language_to_another
Share:

Why not just write the SQL code?

Posted on Saturday, January 24, 2009 at 2:28 AM.

Lately, there has been a trend towards taking a high-level language, and using it to write SQL-like queries. In the .NET world, the best known example is LINQ. Today I saw a similar, albeit much more simple, approach for Python.

These approaches differ from many ORM systems in that we're still thinking in terms of raw queries, and not as much in terms of objects and the relationships between them, and how the data maps to tables and the relationships map to constraints. Likewise, they differ from SQL embedded within another language, in that we don't actually write much, if any, SQL code.

But after looking at the example Python code, I'm not sure I understand how it is better than just writing the SQL code in the first place. We're still making use of most of the same concepts as in SQL. The general form of the query is the same. We're selecting rows containing certain columns from joined tables or subqueries, subject to certain filters, and sorted in a particular order. There's nothing remarkable happening there.

If anything, we end up writing queries that are nearly identical to the equivalent SQL, but instead using a syntax that is more awkward to work with that SQL is. And we don't gain any extra functionality or abilities in exchange for this loss of developer productivity. We don't benefit in any way by avoiding SQL code, and instead just writing Python code that is virtually identical to that SQL code, but with a more awkward syntax.

Most decent programming languages offer some way to easily interface with a wide variety of database servers, typically also offering easy ways to parameterize the SQL code passed to the database to increase the level of security. So for the vast, vast majority of database queries, we should be able to put the SQL code inline, parameterize it, and pass in the parameter values when we execute the query. We should never have to resort to string concatenation, or anything of that sort. We just write pure SQL, within a string in the host language. It's as simple as that. There is absolutely no need for the SQL-like adaptations of the syntaxes of non-SQL programming languages.

Permalink: http://pinderkent.phumblog.com/post/2009/01/why_not_just_write_the_sql_code
Share:
Feeds
  • RSS 2.0 Feed
  • Atom 2.0 Feed
Tags
Archives