Recent articles | Archive

Many universities have active careers centres where soon-to-be graduates are matched with employers, making companies, the universities and graduates happy. This is, in general, most likely a good thing.

Problems arise when employers from technological companies are mixed with careers centres of technological institutes. This was demonstrated yesterday in my receiving a "Vacancy for a Developer/Programmer" from Veenus Group.

A recruiter has the task of finding the most suitable candidate for a given opening. Recruiters are just salesmen punting jobs. Like all salespeople, the aim is to get the greatest value for the client at the least cost to the client. There's also the need to sell as many jobs as possible. There's also the need to justify to the client the suitability of a candidate.

For a large organisation employing programmers but where the production of software is not the main focus, those in management are not programmers and those in management were never programmers, therefore those in management are in the worst possible position to make a good decision about hiring programmers because they know nothing of what is required of a good programmer.

So, we have salesmen with targets answering to management who don't know good from bad. Putting together all these factors and adding the IT industry to the mix results in a very messy situation.

Let's digress a bit: what makes a good programmer? Most people in the audience are normally rather quiet at this point because, quite simply, not many people know. It's one of those questions where the answer is really obvious when you know it but otherwise beyond reach.

What makes a good web developer? That's a lot easier to answer, and you'll get calls of 'HTML', 'CSS', 'Dot Net' from the audience. The occasional brave soul will call out 'Ruby on Rails' (the same people calling out 'Java' five years previously). 'XML', shout the old timers.

Perhaps we'll hear 'configuring IIS' or 'installing Apache' as the most obvious ideas run thin on the ground. Mixed in with all this, you'll find the salesman recruiter strafing the audience with acronyms and abbreviations like a technological machine gun spitting red-hot bullet points.

What makes a good Windows developer? That's not too tricky to answer either, although since it's not quite as cool as the whole Software as a Service fad, you might get a few mumblings of 'Win32 APIs!', 'VB forms' and so on.

Amidst all the suggestions, you again find the salesman recruiter, sharply suited with a fashionably shiny pink tie, finely heeled in glossy leather shoes coming to an unnaturally long point at the toe, killing the atmosphere with cannonballs of the latest and greatest abbreviations there are.

There will be a minority sitting at the back on the left, quietly sipping their drinks with ever broadening grins, as they listen to the unending stream of wrong answers.

The salesman recruiter, sitting in the audience midway from the stage and to the right, now so excited and frenzied he's almost at the point of initiating both his own standing ovation and a Mexican wave, drives the audience away in perceptible ripples as he approaches bursting point. Twenty seven acronyms per breath and one red face later, he reclaims his island seat.

Clearly the equation "Good Programmer = X" is not satisfied by "X = long stream of tech industry acronyms and abbreviation jargon". Good Web Developer != HTML+CSS+PHP+Perl+Python+C#.

In the world of quantum mechanics (and bear with me, for this is relevant) there is, at any point in time, a certain understanding of how the building blocks of the universe work to a certain level of really really small smallness. Of all mathematical models put forth to describe whatever is the latest level of understanding, the one most simple, balanced, symmetrical and beautiful is, time and time again, the one that is most right.

Whether this is true of other aspects of life I couldn't say, but the "Good Programmer = " equation certainly does follow the same pattern.

One of the first scientists to actively investigate this problem was Joel Spolsky. The success of his long running experiment in the shape of Fog Creek Software empirically proves that Good Programmer = Smart + Gets Things Done:

People who are Smart but don't Get Things Done often have PhDs and work in big companies where nobody listens to them because they are completely impractical. They would rather mull over something academic about a problem rather than ship on time.

...

People who Get Things Done but are not Smart will do stupid things, seemingly without thinking about them, and somebody else will have to come clean up their mess later.

...

Any skill set that people can bring to the job will be technologically obsolete in a couple of years, anyway, so it's better to hire people that are going to be able to learn any new technology rather than people who happen to know how to make JDBC talk to a MySQL database right this minute.

Solutions are output after processing two streams of input. One input is the problem, with all its constraints and pleasant quirks. The other stream of input is factual knowledge. Some factual knowledge will inevitably lodge itself in my head if I use it enough, but it doesn't stick because it doesn't need to.

If I know how to design software well, I can read a book detailing the syntax of a given programming language and then implement that design in that language. If I know how to program in a given language but don't really know how to design software irrespective of the language, I'm completely stuck when a new language comes out. Knowledge of the facts is pretty irrelevant.

After that short digression, I'll get back to the point.

On the one hand we have management within a company who, by definition, can't make a good decision on who to hire. And on the other hand we have a salesman selling the vacancy and answering to people who can't make a good decision on who to hire. Management likes things short, snappy and to the point so that time for the lunchtime golf session does not go to waste. Management want checklists, metrics and measurements of success and failure.

Tell such management types that the ideal candidate is someone who's smart and gets things done and you'll be asked to quantitatively measure that on a business card-sized snip of paper that can be understood in five minutes.

It's no wonder that recruiters, in trying to find the most suitable employee, match the acronyms of the candidates with the acronyms of the ideal person. Jim gets 8/10, Bill gets 6/10. Quick, easy, hire Jim, kill Bill. Time for golf.

But these flash kids coming out of universities these days have whole streams of acronyms to pick from. Why ask for just ten? Jim gets 8/10, but Ted gets 12/15. Surely the more acronyms you can squeeze out of a candidate the better? Why stop at just fifteen? Why stop at just twenty?

Which brings us back to Veenus Group, who currently prefer candidates to have a "good understanding" of twenty-one key jargon words:

  • ASP.net
  • C#.net
  • VB.net
  • Java
  • PHP
  • MS SQL Server
  • MySQL
  • Servlets
  • JSP
  • HTML
  • JScript
  • DHTML
  • CSS
  • Zen Cart
  • Joomla!
  • Payment Gateway Integration
  • Action Scripting
  • SEO techniques
  • SEM
  • WEBMASTER Tools
  • Web analytics applications

At a meagre six months per jargon to be competent enough to be able to get things done without making a mess for everyone else, Veenus are looking for someone with 10.5 years of experience in a range of fairly diverse jargons. Let's for now ignore the ridiculous notion of asking this of fresh university graduates.

We can group the jargons into three or four related areas where one might be suitably specialised to be of some real value. Someone possessing a "good understanding" of all twenty-one jargon-heavy points is not going to be suitably specialised in any one of the three or four areas to be of real value. Good programmers are up to twenty times more productive than average programmers.

As an employer, do you really want to be hiring people who have an understanding so diverse that they're capable of adding, at best, an average level of value to what you produce? Do you want to publish such unrealistic expectations that you'll actively drive away those talented people who can be of most value to you? Keep doing so and you can't fail to lose every time.

As a potential employee, do you really want to work for a company that demonstrates such a lack of understanding of what is needed?

As an employer, cut that list of jargon out and look instead for people who are smart and get things done and you'll destroy the competition. If you really want the lists of jargon, keep it to a core four or five points, look for people who really know these technologies well and pay them four or five times as much. The overall amount you pay out in salaries will be significantly less, and the quality of your employees, and what you produce, will be significantly increased. You can't lose.

As a potential employee who is smart and gets things done, you'll be invaluable to a company that recognises this. Don't waste your days away in the monotony of a poorly-managed, slow-moving downward slope. Look on the job boards of Authentic Jobs, Joel on Software and 37Signals.

Unless, of course, you are so talented in so many areas that you have the skills not possessed by anyone else on the whole planet. In which case, go and apply for the position of Plutonium-Eating Martian Superhero currently open at Veenus.

Further reading

We use computers to help us solve problems. We write programs to tell computers how to carry out the tasks we determine to be most relevant to solving problems.

Program code makes a computer do things, therefore code is for computers. It's a seemingly logical conclusion, an easy mistake to make and certainly part of why programming is so very hard.

But the program code we write is for people, not computers. A computer doesn't understand Java, C#, PHP, Python or Ruby any more than it understands what you're reading right now. A computer understands machine code only, instructions encoded as little blips of voltage. Little blips of voltage go in and little blips of voltage come out. And that's all your computer will ever understand.

The problem is that people find it tricky to remember what sequence of blips of voltage do what, since 01010010 looks little different from 01010100 most of the time, writing anything more than something very trivial would just take too long and making any changes to a program would take just as long as writing it in the first place.

So programming languages came along and let people use simple English-like instructions to control computers. A few decades pass and our high-level languages more closely resemble natural language than they do machine-level instructions.

The more advanced our programming languages become, the more distanced they are from anything a computer can comprehend. The advancement in programming languages serves a single purpose: to let people write programs in a way that makes most sense to people.

With every line of code you write, ask yourself how well it can be understood by people. Because program code is for people, not computers. A computer will never read your code, only people will. A computer will never need to understand your code, only people will. A computer will never need to manage, modify and maintain your code, only people will.

Just as our software should be straightforward enough that a manual is not needed, so our code should be straightforward enough that a hefty set of documentation is not needed.

I know that car.leftFrontWheel().inflate() will inflate the car's left front wheel. I shouldn't have to ponder over some documentation before figuring out that car().convolutedMethod() does the same.

I know that bankAccount.deposit(20, 'GBP') will put £20 into my bank account and bankAccount.withdraw(20, 'GBP') will take it out again.

Forget Hungarian notation, it's plain to see that the variable currentBankAccountBalance will contain a floating point number, not a string, integer or boolean, and that numberOfBunniesInCage can't possibly be anything but an integer.

Forget comments that tell me what the code does. Explain it with the code itself. Make the code describe what it does, saving comments for describing why you chose a certain approach or that a seemingly superfluous line of code is actually a workaround for a bug and not to be removed.

Write your code for people to understand, not just programmers and certainly not computers. Make your code more readable and more understandable. Since the majority of time spent with your code will be when other people are maintaining it, code written for people will result in lower development costs, lower maintenance costs, lower bug fixing costs and happier developers.

As with any writing, it's essential to keep your audience in mind. You know, deep down, that your code is not for computers. But since your code is certainly used by computers more frequently than it is used by anything else it's easy to miss the point and forget that code is for people.

I received an email from Crazy Egg (web site analytics software) telling me that although they no longer offer free accounts, I previously registered when free accounts were around and so could still use my free account.

This was nice, but my password was a mystery. So I went through the process of resetting it.

Your password has been reset. You may log in with your new password (the one you just created)

I may now do what? If you consider what information I have provided up until this point, you'll spot the flaw in the process.

  1. I entered my email address in a form
  2. I clicked a (uniquely-identifying) link in an email
  3. I chose a new password

And now I can choose to log in. At this point, Crazy Egg know who I am (I entered my email address) and know I'm me (I clicked a uniquely-identifying link in an email sent to me) and know my password (I just chose it).

They know who I am, that I'm not lying about it, and what my password is. I couldn't have provided a more thorough proof that I'm me. Why can't I be logged in automatically at this point? It's not a trick question and I'm not, in this instance, being overly pedantic.

This serves as a good example of why it is essential to reduce the number of steps in any process to the absolute minimum whilst also asking the user for the absolute minimum.

Doing so makes it easier for the user (they'll like you more) and increases the chances of the process being fully completed and the user actually using your web site instead of, in this case, pondering over the usability of a password reset process and forgetting about everything else.

Let's review the process and consider how it could have been easier. What's the current process?

  1. enter email address in password reset form
  2. click link in email
  3. choose new password
  4. click log in link
  5. enter email address
  6. enter password (possibly involving returning to step 1 and starting again)
  7. click login button

Seven steps and we finally get access to what we want. In practice, I never completed step four as I wasn't really that bothered anyway. Had there been no step four, I would have been in the service and might, having seen what it had to offer, been bothered. But it's too late now, I've lost interest. Web users are particularly fickle.

The good news is that we can get rid of step four, presenting the option of holding the interest of us fickle web users just that little bit longer.

  1. enter email address in password reset form
  2. click link in email
  3. choose new password
  4. click log in link
  5. enter email address
  6. enter password (possibly involving returning to step 1 and starting again)
  7. click login button

Right after step 3, Crazy Egg knows all that is required to take me straight to my account. Resetting my password is a means to an end and all I really want is access to my account. I've done my bit, now do yours.

I never particularly enjoy resetting passwords. I don't explicitly not enjoy the process, but it's hardly engaging. I doubt anyone is thrilled by the act of resetting a password - it's just a required something that happens from time to time.

So when making Hosting Reborn, I spent some time thinking about how to reduce the password reset process to an absolute minimum by applying three simple concepts.

  • reduce the number of process steps to the bare minimum
  • reduce the amount of user input to the bare minimum
  • assume as many defaults as is sensibly possible

We can reduce the amount of process steps by asking the user to do less and by doing as much as we can based on what the user has already told us. This means that after the user has identified themselves and chosen a new password, we should take them straight to their account because we can and because that's the whole point of resetting your password.

By applying some feasible defaults, we can reduce the amount of user input by 50%. I opted for choosing the user's new password for them. It's still to be seen whether this is the worst choice, but it's done.

What we end up with is the bare minimum password reset process which gets the user to their goal - their account - with the least hassle and as quickly as possible.

  1. enter email address in password reset form
  2. click link in email
  3. confirm request to reset password

If you want to reset your Hosting Reborn password, you enter your email address, click a link in a subsequent email and confirm that you want to do it. That's it. Your in your account. Your new password is chosen for you and displayed so that you can make a note of it.

And since the process is fast and easy, you don't even have to bother yourself with remembering your password - resetting your password for a service you use infrequently is easier than having to remember a suitably-secure password. Having to remember nothing is very easy.

The password reset process for Hosting Reborn uses 60% fewer steps than Crazy Egg. That's not bad.

Of course, this might not be for you. Perhaps you have to let users pick their own password. You'd still have around a 40% process step reduction which is still excellent.

Just remember to reduce everything to a bare minimum - leave no process step that could possibly be removed, ask the user for nothing that could feasibly be assumed or for which a sensible default could be applied.

Clearly this doesn't apply to password resets only. Do this everywhere you can. Ask the user for as little as possible as do as much as you can with that. And then do a little more.

Please participate in my two small, quick tests.

You get to enjoy clicking two things, waiting for two things to happen and saying which of the two was faster. It should take no more than one minute.

I'm researching the extent to which people can or can't notice the delay between an action and a response.

For example, you click a button on a website, wait a little, then something happens. I'm trying to determine how small that 'wait a little' is before it no longer becomes noticeable.

This is part of a performance comparison of popular data serialisation technologies I'm currently conducting.

Running through the tests may indirectly make the world a better place.

XML is commonly used for web application messaging - sending information back to a browser from a web server, or sending information between web services. It's dead easy to do this and it works very well, hence XML has become the de-facto choice for data exchange for web applications.

Alternatives such as YAML and JSON have found significant support in recent years. Both aim to be a more suitable alternative to XML in some cases.

How much interest is there in knowing which is best? Let's see.

  • Google: xml vs yaml - 66 million
  • Google: xml vs json - 323 thousand
  • Google: yaml vs json - 1 million

Ok, so that's not an exact search. But is does suggest a huge amount of interest in a comparison between XML, YAML and JSON. (And no, Google, I didn't mean "xml vs xml" nor "yaml vs jason", but thanks anyway).

What's the problem?

XML might not be the best choice in all cases, but that's no revelation.

Dare Obasanjo referred to JSON as being "another nail in the coffin of XML on the Web".

Tim Bray solved the problem for us 2 years ago.

David Megginson decided it all ends up looking like XML when you add a little complexity, but did note that:

JSON [has] the important advantage [of making] the most trivial cases easy to represent.

James Bennett reminds us that JSON works:

because most people don't really need all that overhead, and because it's often possible to do really interesting things with really simple formats

Even 6 years ago David Mertz pointed out "some situations where YAML provides a better object serialization format than XML".

And, of course, Dustin Diaz informed the masses that JSON was not only fast but so easy it'll make you sick.

There's no end to the argument, but also not much factual evidence either.

Ultimately, I think Jeff Atwood best sums up the gist of the issue.

I don't necessarily think XML sucks, but the mindless, blanket application of XML as a dessert topping and a floor wax certainly does. Like all tools, it's a question of how you use it.

So we know XML is not ideal, and JSON or YAML may be better in some cases. JSON might be faster, YAML might be better (and more beautiful).

But in what cases would you go for one instead of the other? What benefits might you see and where? I want cold hard facts, numbers, charts and answers.

Who knows?

How much academic research has been made in the field? Let's see what journal articles have been published that compare either XML, YAML or JSON in any way.

Ok, I'm getting desperate. The ACM and the IEEE are not small. They should have at least something of relevance.

Searching ... searching ... searching ...

Nope, turns out the ACM and IEEE journal archives contain nothing of direct relevance. There's even one article that relates to a completely different YAML.

Google Scholar, can you help?

Well, there is one academic article that explicitly compares XML, YAML and JSON (PDF, 200Kb).

It seems that both YAML and JSON are faster to encode for up to about 5000 elements, then XML takes over. It also looks like both YAML and JSON require twice as much memory as XML when decoding. I couldn't determine whether the article relates this to real world performance (the article speaks Portuguese, I don't).

The point: not much academic research appears to have been undertaken and there's a huge amount of interest in some form of performance comparison.

There is no clear sign of any scientifically-arranged, repeatable, verifiable hard-evidence-based comparison. So I'll do just that.

Goodbye life for the next 2-3 months, and hello data object serialization formats for the new world.

What will be studied?

I'll run some tests to determine which of the three technologies offers the least:

  • encode time
  • decode time
  • transmission time
  • overhead

The tests will be strictly scientific - I'll be doing my best to remove or minimise any influencing factors. Everything is going to be precise, exact and - most importantly - repeatable.

The results themselves might excite or scare a small number of developers. For the benefit of the rest of the world, I'll also be looking into why this is actually useful.

  • Will people notice that JSON is 607 ms faster?
  • Will your web server explode less frequently if you use YAML to talk to web services?

Just to top it all off, I'll also look at whether we need to be sending string-based serialised data between web services and whether we might be better off opting for much much faster choices such as Google Protocol Buffers. And anything else along the way that may be relevant, time permitting.

I need your help!

I've set up some test into the perception of time delays. I'd like to initiate some form of distributed stress testing on some web services. There will surely be plenty of tests and tasks that would benefit from a few minutes of everyone's time.

When will the results be ready?

This is part of my final year project, due at the end of April 2009. I'll have some results before then (I hope!) and will write up short pieces where possible. I'll try to make full and final results available after I finish my final exams, so that'll be some time around the end of June 2009.

Recent articles

Article archive »
advertisment

Hosting Reborn Pay as you go hosting

  • free registration
  • free instant hosting account setup
  • no monthly fee
  • no monthly hosting account fee
  • no disk or bandwidth limits
  • pay only the disk space and bandwidth you use
  • don't pay for extra space you don't use