Saturday, July 30, 2011

PayPal IPN and App Engine

Well, the experiment seems to be a success, Igor. The IPN endpoint has been in production and functioning for a couple of weeks now. Messages are coming from PayPal, hitting the distributor, and reliably being passed on to the IPN Endpoints. Not a single missed message. We have two of them running now, one for each subscription system.

We've had more trouble with the endpoints being unable to handle the extra streams, frankly. (Don't cross the streams, even though you can) The IPN protocol is whatever the opposite of "stateful" is, so they already had enough trouble trying to figure out what each incoming message was for, and what to do with it. Although the more specific the trigger (like 'Affiliate Program' software that only cares about the sign-up, not the subscription process) the better it works. Fortunately if things go weird, we just turn off the spigot to that IPN Endpoint, or apply some filters.

The only thing close to a severe 'bug' I encountered was not paying enough attention to the character encoding. Here are the two crucial points:

  1. PayPal sends IPN messages in charset "windows-1252" by default.
  2. Google App Engine Java programs expect "UTF-8"

This wasn't really apparent until we had a few payments come in from people in Europe and Asia, with their hoarded supplies of umlauts. If you're wondering 'Why the difference?' it's because PayPal is apparently still stuck in 2002. I don't know.

The incompatibility doesn't seem like much, but in the GAE Servlet environment you need to call setCharacterEncoding() before you call getParameter() for the first time. Since you don't know what the encoding is until after you read the 'charset' parameter, we have a catch-22.

The 'correct' solution to this is to pull apart the input stream ourselves (thus duplicating all the Servlet code which already does this for us) in order to determine the original encoding. Then set the encoding format on the request object, and then re-read the data parameters as normal. Ugh.

However, if you try this, you will run into the next issue: the verification callback to PayPal must exactly match the message you were sent. If you translate the incoming message into UTF-8 before distribution, the IPN endpoints won't even know what the original encoding was, and therefore how to send the message back.

The verification will fail, and PayPal will keep resending the message until timeout. Bad.

Now, even if PayPal is clever enough to ignore the raw difference in the character encoding field and do all the relevant translation perfectly (ha!) that still isn't good enough. So long as there isn't a complete 1:1 mapping between the two character sets (windows-1252 and UTF-8, and there isn't) there will always be 'edge cases' of particular characters that can't make the round trip. Certain names will break the system. Not normal names, but they'll be in there due to strange keyboards, or malice.

The best solution is to back away slowly from the entire mess of character encoding translation and do the following:

  1. Set the IPN character encoding to UTF-8 in your PayPal account.

That's it. I really don't know why windows-1252 is still the default for new accounts. Pure inertia, I expect. That brings IPN up to code with the modern internet.

You'll have to check that your existing IPN Endpoints are UTF-8 compatible, of course, but if it is I suggest you do this sooner rather than later, as future-proofing. (At least test it.) And for new accounts, do it on day one, because it's much scarier changing it once transactions are flowing. One mistake and they all go tumbling away into /dev/null.

Of course IPN messages that are already in-flight (or re-sent from inside the PayPal account) continue using the old encoding, (Arrrggghhhh!) so the switch-over takes a couple of days and can't rescue failing transactions. The solution for already-sent messages is to manually validate them in each Endpoint (if you have that facility) to push the transactions through.

They're real transactions, it's just PayPal will never acknowledge them once the translation gremlins have been at work. Fortunately it's only the random names that will be a little screwy for a month, (before the next transaction corrects it) not financial data or item numbers.

When I get time, I should at least add an 'expected encoding' option to the software for those situations where you have absolutely no choice and need to force the Distributor into another character set. That might have all kinds of consequences, though.

Another logical extension to the IPN distributor would be to also intercept the validation callback from the IPN Endpoint to PayPal in order to translate the encoding back to original. (Also, to mark off that endpoint as having received the message and prevent re-sends)  On close inspection this is quite a bad idea, because it removes the 'cross checking' inherent in having PayPal separately validate messages passed by the Distributor, and creates a single point of attack within the distributor/proxy. It might become necessary, but it's still a bad idea. Plus, how many IPN Endpoints allow you to change the callback URL from the "paypal.com/webscr/" default? (Some, but not all.)

It works. A base has been established. There's clearly a few extra problems to solve, but I have other things to mess with before that.

No comments:

Post a Comment