The Unorthodox Engineers: June 2011

Tuesday, June 28, 2011

The Inevitable

So I ran the admin panel in IE9 for a little cross-browser testing, and was horrified by the results. The empty white screen turns out to due to one extra comma on line 187, but I'll have to be back on the development system to make that fix. Netscape and Chrome don't care about the extra comma, and I really should have some some extra syntax checking going on, I just sometimes forget how contrary IE likes to be with their Java.. *cough* ECMAScript.

It's the only 'standard' that Microsoft refuses to extend and improve, and remains committed to the absolute letter of the spec. Apparently it was done out of spite, likely in the expectation that freezing in all the worst features of the language would kill it off in favor of ActiveX or .NET or whatever was out that week.

Funny story: Javascript won. And because Microsoft wouldn't let them fix the language, it remained broken, and won anyway. There's a lesson in there somewhere, but nobody agrees what it is.

It's especially funny in light of this week's kerfuffle about how much HTML5 and Javascript (I mean ECMAScript sorry) is going to be driving the user interface of the next version of Windows.

The thing very few programmers understand is Why? Why on earth would you throw away structured OO languages and IDE's for the most hostile programming environment yet created? And it's because the Browser is the most advanced User Interface toolkit ever invented. Compare HTML5/CSS3 against Swing or Silverlight or GTK or any UI framework. When it comes to that last artful few transparent pixels in the face of variable data sizes and mixed content, HTML layouts work better than anything. From the consumers point of view, this is most of the story. Prettier programs win.

So, HTML is the prettiest UI layout system, and therefore ANY language which deeply integrates with the DOM would probably be a winner. We're just lucky that Javascript has good bits, and does allow some clean design and clever code once your alter your methodologies.

This is the opposite of the way it's supposed to go. Good underlying design is supposed to bubble up to a generic interface. This other theory says it goes from the pixels in, and we programmers are forced to use 'good enough' technologies to support the animated .gifs. A bit of a blow to the pride, but if you accept it, there's a platform of billions of Javascript compilers out there, all waiting to run your code.

If you leave out the extra comma, that is.

Wednesday, June 22, 2011

Invention's Mother

I'm so very tired. Apparently coffee is not a long-term substitute for sleep after all. I ran a few successful tests of the distributor last night. The admin interface is coming together, and can manage the handful of configuration records. Messages are coming in and going out.

I nearly made a massive mistake in coding, by almost relying on the system date/time as a synchronization method among my unknown number of parallel instances.

I had made the assumption that within GAE, machine times would be synchronized to within (at least) the NTP ping drift of milliseconds. This is totally not the case. (One GAE server apparently managed to get 40 minutes out of synch, according to a Google engineer. Yeah.)

That's a shame, because properly synchronized clocks are hard to do, and not something we should be doing ourselves. It's slightly amusing that Google considers net access to be more important than knowing the time :-)

Memcache seems to be the preferred synchronization primitive, since it is basically a tiny networked database server with atomic operations. (Don't even consider the datastore for synchronization locks) And the use of named tasks in queues seems to be another pro tip - job names can't be re-used (for a week) after being allocated, so proper naming means idempotence. GAE will say "I can't schedule task 'send email 1029302' because it already did it!" but unnamed tasks will happen as many times as they are scheduled, and this can be more often than you would expect or want.

There is a very specific level of functionality needed in our application, and it wasn't decided on by committee or sales meetings or customer feedback. It's designed by necessity. We need this thing. We need it very, very badly. If we didn't need it so much, I wouldn't be taking time out of the primary project to do this.

There are a lot of bad reasons to develop software, such as chasing fame and fortune. Or even revenge. It rarely works. The best software comes out of need, from necessity.

Now that the prototype is complete, it will be put into service almost immediately. It will feed the other system we have been developing, the one we needed it for, and the two will grow up together. It will get stress tested, and attacked, and patched. That which does not kill it, will make it stronger.

Eventually I'll forget it's even there, just like all good plumbing.

Somewhere along the way, when it's stopped being exciting to use the application (like dancing through a minefield at the moment) it will be ready to unleash on the world, and we'll see how many other people's problems it can solve.

It is by caffeine alone I set my mind in motion.
It is by the beans of java that thoughts acquire speed,
the hand acquire shaking, the shaking is a warning.
It is by caffeine alone I set my mind in motion.

Tuesday, June 21, 2011

First Post (for it, not me)

This is the first message processed by the distributor, as shown by the admin web interface, although a bug in the code has prevented the 'value' column from having any on screen. (it's there in the datastore, though) This is what software looks like when it's barely working.

First Post!

It will be interesting to compare this screenshot with what it eventually becomes, like baby pictures. My programmer's view of proceedings is quite a bit different, and tends to look like this:

The way I see it

It's the debugger console inside Google Chrome, although you'd see much the same thing in Firebug or, I hope, IE's new debugger. This is the 'command line' hidden inside every browser. Learn it's secrets.

Here is my debug function to help you along. It took me quite a while to get it right, and handle all the stupid browser cases. Anything less than these checks (like using console.debug() directly) means your code works fine on your machine, but instantly fails when run on any browser that doesn't have a debugger installed.

// system-wide debug function
function _debug(x) {
try {
if(typeof console !== 'undefined')
if(typeof console.debug === 'function')
console.debug(x);
} catch(err) { }
}

I like to invoke it in code like this:

_debug(['function i am in', that_thing, other_thing]);

In order to get a bunch of useful things on one log 'line'. Much more useful than message boxes, and it doesn't matter if I forget, and leave the debug statements in. If the user doesn't have the console open, they won't see the messages.

It's fun to have the console open (especially the network tab) when going to major sites like Facebook and GMail. You can watch all the AJAX calls, and see where your bandwidth is going. Some sites are elegantly efficient in their data transfers. Others are more like a swan... all stately grace above the surface, but with furious feet paddling away underneath.

One thing you'll quickly see, none of the major sites are passive 'ol HTML pages anymore. The pages watch you back.

They know what you've been browsing, they know when you're awake.
They know your language and your zone, for the interface's sake.

Iconify

I needed an Icon for the project, something to stick in the corner of the Admin page. So here it is; the perfect visual metaphor (for now) of what we're building.

It's a distributor assembly, in case you're wondering. That's the input gearing on the left, and the distributor cap on the right. I coloured it a little. Hopefully the icon evokes these ideas:

We're writing a distributor.
It's for Google App Engine
It's intended to be a component of a larger system
It's pretty neat, although most of that is hidden inside.
The first time you see it, you think it might be a spaceship.

The problem is, now that the project has been given a metaphor, it's hard to resist it. I've caught myself wondering whether I should rename classes to 'Plugs' and 'Points'. At the moments those classes are named 'Filters' and 'Tracks', because what happens inside the distributor has a much better visual metaphor:

And there's the problem with metaphors, and icons; at some point the conceit breaks down, the analogy gets pushed too far, and it stops helping. In fact, it can freeze people's thinking and give them... sorry... one track minds.

Case in point: the original Macintosh OS, where you had to drag a disk into the trash to eject it. Obviously.

So why do we use icons and visual metaphors at all, if they are ultimately flawed? "Draw me a picture of the software"? I may as well ask you for an interpretive dance, for all the relevance it has to the original form. The only visual representation of the program that truly matters is this one:

But it does lack something in the way of brevity and wit. Hard to put on a T-shirt.

Perhaps that's all icons are; attempts at witty visual puns. That would mean the ultimate compliment for an icon is the same - People who don't know what it represents, once told, should groan loudly and emphatically at the terrible joke.

Icons are going through a bit of a recession at the moment, not like the profligate WIMP-fueled software binges of the late 90's, where every TreeView and ListView had a pallet of pictures. Every menu item it's own little abstract work of art beside it. ("Ahh.. I see... It's a man with an umbrella leaning into the wind, clearly the Firewall Settings option.")

If you take GMail to be an exemplar of the modern web GUI, there are barely any icons at all! Buttons and menus have been stripped down to bare text, because it turns out that the shape of a word can be just as iconic as a 16x16 block of pixels. Perhaps more so, since people tend to recognize them without being primed. Meaning comes pre-attached, rather than having to be learned.

Partly this is because people are more familiar with computers than ever, and no longer need metaphors for the simple operations. 'Cut', 'Copy' and 'Paste' are no longer abstract terms from the editor's desk, they're just taken for granted. We've forgotten where the 'Tab' key came from, because computers are not really typewriters, no matter how hard a generation pretended.

Computers are computers. They are not like anything else. That's their greatness.

I believe in 'honest' user interfaces which try to reveal the inner workings of the problem, while giving you tools to help to solve it. The bad interfaces are the ones that hide, and obfuscate and pretend. (usually in the name of 'user friendliness' and designed by someone in sales)

The worst ones put that condescending 'please wait...' message on the screen while thoroughly trashing your hard drive, on the basis that you don't need to know what the program is doing, because it's smarter than you. And you are allowed no way to stop it. Bloody Wizards.

Computers are machines. So if they are like anything else, it would probably be a photocopier. Sometimes you just have to pull it open and rip out the crumpled, smoking shreds of paper before it will work properly again.

Once again this topic was covered by the Master, years ago:

"The difference between something that can go wrong and something that can't possibly go wrong is that when something that can't possibly go wrong goes wrong it usually turns out to be impossible to get at or repair." - Douglas Adams

Sunday, June 19, 2011

940 (Java)

I've been busy. I just counted, and there are 940 new lines of Java code that weren't there two days ago. There's also 1160 lines of JavaScript, but probably 80% of that is copypasta from other projects. And some CSS I wasn't bothering to mention.

That's quite a lot of new code, even for me, especially in an environment that I'm still a little unsure of. But once I got rolling it all sort of came back. (a nice trick since most of this stuff doesn't get much more cutting edge - but there's a distinct retro about many parts of the App Engine.)

It's been a good day. I have a prototype. Or perhaps I should put that in air quotes, as in "prototype", because I'm not sure it's advanced enough to deserve the appellation. I'm still driving most of it's features from the browser's command line, and most people don't even know that their browser has a command line.

There are moments when you realize that, against all the odds, it's really going to work. Today had some of those, as the first messages were serialized, queued, and resent by the daemons. That actually turned out to be the easy part. The hard part (as always) is writing the command interface, because you have to connect two worlds in a meaningful way.

In this case, the internal worlds of the web server environment (Java servlets in the GAE) and the web browser, which means JavaScript. Despite the shared name the two languages are incredibly different, and so I've been using a third language - JSON - to bridge the two.

On the server side, data is stored within Java Data Objects (JDO's) by a Persistence Manager. To say this is not your average SQL relational database is an understatement. I believe it's technically an Object Oriented Database, although it probably breaks that category in significant ways too.

The data in these Java objects is ultimately used by Javascript executing in the browser, so our modern 'layer stack' (if you want to brush off your OSI) goes like this:

JavaScript Code
Interpreter
Browser
HTTP / AJAX / JSON
App Engine
Java Virtual Machine
Java Code

As a programmer, I have to accept that I have absolutely no control over the middle five layers. So I've been doing what I always do in this situation: see how wide a communications channel I can open up between the two ends that I do control.

That means AJAX, which is just JavaScript making HTTP calls back to it's originating web server using the same protocols as form submissions. Nearly every browser has now supported this for years. And as much as I like XML, parser encoding incompatibility is constantly killing umlauts and other special characters. JSON is the essence of XML without the namespace complications. It's so simple that practically every language has a parser.

Let's say you have lots of bits of data on your server, and you're using AJAX not only to retrieve it, but to update all those little checkboxes and filenames and option settings. That's when you discover most browsers limit the number of simultaneous AJAX connection to as little as 2-4 connection. How many exactly? It depends on the browser.

Each ajax call can take a few seconds. So a flurry of clicking can cause a few minutes worth of ajax calls before the system properly responds again. Not counting all the widgets which want to update regularly. Not great.

The solution is to write a 'multiplexer' which takes over the precious communications channel resources (the low number of simultaneous AJAX connections) and shares it among the many needy clients in a way that improves life for everyone. It can send all pending requests in a single batch, and get all the responses in one go. In theory only one request has to be in flight at any one time.

I've written a concept system before, but that was to connect JavaScript to PHP / mySQL. But GAE Java is a totally different environment, so I started again from scratch. Mostly I'm exploring how App Engine does it's thing.

Which pretty much explains where the thousand lines of code came from. My multiplexer is working. I can now query and update data inside the GAE DataStore by typing commands into my browser's JavaScript debugger console. Commands like:

DATACACHE.write({
type: 'option',
keys: 13,
data: { option_value: 'testing', option_desc: 'Option Description' }
});

or

DATACACHE.read({
type: 'message',
keys: 22,
call: function(x) {
_debug(['write option',x]);
}
});

It's all quite basic, but for the moment that's exactly what I need. Basic access to a handful of configuration records. Now I just have to invent some pretty buttons and HTML controls to issue the commands instead.

Friday, June 17, 2011

Jackson and I

Now I need a JSON parser library suitable for App Engine. My needs are pretty modest, a few arrays of simple objects, mostly to handle message lists. I do the usual thing, and look around the internet for a library.

I've written parsers before, lots of them, and there are a couple of ways to go about it. Some parsers care about speed, some care about good error messages, some care about helpful data conversion. It's important to pick one that matches your problem space. In this case, I'm intending to parse lots of JSON coming both from the Internet over http, as well as strings stored securely in the database. That means speed and robustness (against attacks) while being lax about things like error reporting. If the AJAX string doesn't parse because of a network burp, it just retries: it's not going to bother the user with debug screens. In fact, too much user error reporting is a security leak for what I'm going to be doing.

I'd like something well maintained, with all the bugs worked out. That's always the great wish.

So I come across Jackson: [ Jackson JSON Processor | Jackson In Five Minute s ]

It's been released for about two years now, and has gone through the expected flurry of patches as it settles in. A couple of other major OSS projects (eg. Spring) use it as their JSON library, so it's presumably running in a lot of places already. The features read like a personal wishlist:

Streaming (reading, writing)
FAST (measured to be faster than any other Java json parser and data binder)
Powerful (full data binding for common JDK classes as well as any Java bean class, Collection, Map or Enum)
Zero-dependency (does not rely on other packages beyond JDK)
Open Source (LGPL or AL)
Fully conformant
Extremely configurable

I downloaded the version 1.8.2 tgz file, used Eclipse to import the packages to my App Engine project (straight out of the archive file, nice one Eclipse!) and was instantly grateful I grabbed the source release rather than pre-packaged JARs because there are some methods on the parser which read and write FileStreams, and GAE blacklists those classes.

At the source level, it's an obvious compile error/warning. I just comment those few constructors and methods out, and everything comes up green in the editor. I wouldn't have been able to do that with the JARs. And the Google tools for Eclipse integrate so well that this was all caught by the IDE before I even did an explicit compile. Probably saved me an hour of screaming frustration right there.

The File-based methods excise cleanly, and the one missing dependency ( the "org.joda.time" libraries, so much for "zero dependency" :-) just needs renaming to "com.google.appengine.repackaged.org.joda.time" now that we're inside GAE, a fix suggested by the editor itself.

I like these new tools. I'm never going back to vi.

So Jackson passes my 'can I get it to compile' test in less than five minutes. Good. Therefore, it's time to read the documentation and try some basic operations.

All software has a personality. Mostly it comes from the programmer, but some comes from the problem itself. Jackson, to me, is an elder Australian tradesman, with a wide toolbelt and depth of experience:

"What's yer problem, mate?"
"I've got some JSON I need to decode."
"Ah, yeah. Much to do?"
"Just a few object lists."
"No worries. There's three ways we can do this."
"Three?"
"Yup. The first is, you tell me everything in advance, fill out the names of the all classes you expect to get, and I'll build up all the objects from scratch. It's called 'mapping'. Good stuff."
"What if I don't know what I'm going to get in advance?"
"Ah, that's option two. I just parse all the data into generic collections, maybe a tree if you like, lots of people like trees these days, and I give you a DOM thing when it's done."
"What's option three?"
"That's for our 'advanced' customers. I don't think ya want that one, not yet."
"What is it?"
"Well, third option is; I just shovel it at you as fast as it comes in, and it's up to you. No storage fees. Good for bulk jobs."
"I see. And you can encode back the other way?"
"Of course, mate. That's the easy part."

I think Jackson and I are going to get along just fine.

Because I'm used to PHP's server-side JSON, I made up a couple of wrappers to emulate those functions:

protected String json_encode(Object x) {
if(mapper==null) mapper = new ObjectMapper();
try {
return mapper.writeValueAsString(x);
} catch(Exception e) { }
return null;
}

protected Object json_decode(String x) {
if(mapper==null) mapper = new ObjectMapper();
try {
return mapper.readValue(x, Object.class);
} catch(Exception e) { }
return null;
}

This generic encode function turns out to be useful for debugging Java state. Not so useful for direct encoding of JDO data however, as there are a lot of intermediate objects that clutter up the serialization. Every 'Text' property gets a sub-object with a 'Value' property rather than just a simple string. I think Jackson knows how to deal with this, and the default serialization behavior can be overridden with some java attributes. That would be useful.

I'll have to read some more. In fact, now that it's installed and working I can justify the time to read the documentation. Pity it's never the other way around.

Once more into the breach.

This is my third Google App Engine project. The first was a toy, the second performs an essential (but simple) business function, and so I have an understanding of the basics.

A common question is, "Where to start?"

In this case, I am writing from the administration webpanel down, for a couple of reasons:

A lot of features and capabilities are only accessible through the APIs. So custom code is the only good way of messing with things likes queues and data storage.
Using the webpanel should be easy and fast, which means it needs the most work. Best to start now. There's a straightforward path from rough debug lists to shiny rounded CSS layouts as you put in the hours of tuning and pixel pushing.
I believe in "eat your own dogfood", which means using the tools myself duing development to accomplish my own debugging tasks. It forces me to show errors and diagnostic information in helpful ways in the web interface, rather than hiding it in log files.
Done properly, it defines most of what becomes your system's public API. These days, most of what the web interface displays comes from structured AJAX calls back to the server to submit and aquire JSON strings. The difference between this and a well-structured public REST API is usually just the naming scheme.
I already have a bunch of code from other projects (like my Ajax Dialog package) and experience with webpanels. They're not a problem.
Nothing is nicer than a good webpanel. Sometimes they can even reach the level of "pretty", "shiny" or even "elegant". Users love that stuff.

This is the new cloud approach to software and their user interfaces. Web pages now instantiate thousands of lines of JavaScript before they show their first title banner, entire libraries of code like JQuery. And while Google has pushed the state of the art of browsers (which now run all this bloated crap with ease and aplomb) I am pained to admit that it was probably Facebook that made the conversion happen so quickly.

It's the old 'Killer App' story. Javascript has been around for years, but sadly neglected. Most people turned JavaScript off during the mid 2000's, when browsers were about as secure as liquorice suspenders. And they only turned it back on again last year to use Facebook and play Farmville.

If only we had known.

So there is now a reliable and consistent layer of JavaScript (I refuse to call it ECMAScript) spread across the entire internet, like a thick layer of strawberry jam on whitebread. Sure there's still the occasional rough pip, bit that just proves it's natural.

This is supposed to be a blog about code, not jam, so let's see some.

public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException {

ServletOutputStream out = resp.getOutputStream();

// authenticate

boolean auth = authenticate();

if(auth) {

out.print("<html><head>\n");

out.print("<title>Administration</title>\n");

out.print("\n");

out.print("<script type=\"text/javascript\" src=\"https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js\"></script>\n");

out.print("<script type=\"text/javascript\" src=\"/js/ipn-admin.js\"></script>\n");

out.print("\n");

out.print("<link href=\"/css/ipn-admin.css\" rel=\"stylesheet\" type=\"text/css\"></link>");

out.print("");

out.print("</head><body>\n");

out.print("Welcome back "+this.user.getNickname()+"<br/>\n");

Ugh. I had to hand-format that. I'll have to find a better way if I want to start dumping vast blocks into here. Plus the colours aren't quite right, and the wrapping is awful, but you get the idea.

That's the top of most web pages generated from server scripts, right there. Load up JQuery, load up the local Javascript and CSS files (with the exceptions for IE) and then the page begins. All dependent on the user being authenticated, of course. I haven't even bothered to set a DOCTYPE yet, it works fine anyway.

All that is inside a Servlet, which is running in the App Engine. Hit the URL, and the page comes up. Lovely.

My next job after all that is to define my Java data objects, and create some basic AJAX scripts to manipulate them. Mostly GET scripts so I can test fresh code by just typing urls into a browser. This is trivial stuff when your backend database is a mySQL relational database server, but App Engine doesn't have that; it has Java Data Objects.

JDO is actually better than SQL, especially in a massively distributed environment. But it sure takes some getting used to, and changes in your thinking. More on that soon.

Thursday, June 16, 2011

Everything is Broken

Nothing works. Even the web site's front page is now full of error's because we switched to PHP5 (needed for new stuff) thus breaking lots of old PHP4 code that was running behind the home page. Hmmm.

Oh well, later.

The Google App Engine User API is definitely a little lacking in some respects, like a Java equivalent of the Python is_current_user_admin function. However the new way seem to be using permissions in the web config to control access. But that's not really good enough if you have that usual situation where lots of people have 'access', but they're grouped into security levels.

App Engine User API has no concept of groups. Therefore if you want anything more than binary allowed/denied for a pre-selected group, you need to write your own user management. That means keep a list of users (in this case email address seems to be the primary key. it's the modern way) and group associations.

Ah, screw it. I'll put a single allowed email address in the options set. They get to be admin. That will do for the moment, 'till I fix all the other broken things.

It's a new day, it's a new dawn, it's a new world.

What happens when you try to develop a piece of software in public?

Well, stage one pretty much has to be where no-one knows exactly what you're doing except for some generously vague hand-waving. "We're writing some software. It's going to be grand!" That's because you have little idea, if you're honest, how you're going to get there.

Oh, you've got the general idea, some sample code, and a backpack full of hopes... but as any hiker will tell you, the mountain has it's own plans.

This time, however, let's see what happens when the climb is blogged.

Software has a life cycle. It's born from ideas, grows with work, lives through use, and dies from neglect. The process can take years, decades, sometimes forever. Or it can be quick. I've seen both, many times.