|
Base64 is undoubtedly the best way to handle binary data in Lingo. The encoded string is safe for just about any purpose, with a size increase of just 33%.
This is quite a lot when you have huge amounts of data, but for most purposes it is negligible, especially when compared to the likes or URL encoding which adds a whopping 150% on average (for high-entropy binary data such as the output of LingoFish).
News: Alex da Franca has optimised the string handling of this function for the Mac. His version is nearly 100 times faster on the Mac while negligibly slower on Windows, so I would generally recommend using his version.
Alex's Base-64 blog update >
Get the Code
History
2005-11-10: Version 1.2 by Alex da Franca. This version optimises the string handling for the Mac. The result is a minor slowdown on Windows, offset by a near 100x speed increase on the Mac.
2003-11-27: Version 1.11 released. Fixed a bug in URL safe encoding. Thanks to Barry Swan for spotting this stupid oversight.
2003-11-16: Version 1.1 released. First release on this site.
About the code
There is more than one way to implement base-64 and my code is quite flexible. If you want to understand the options (which are fully documented in the source code) then you should read the following introduction.
RFC 2045
Although you could implement base-64 encoding any number of ways, the standard that everyone uses was originally set out in RFC 1521, and later in RFC 2045.
This is the preferred encoding scheme to use with HTTP POST, for a number of reasons. It is absolutely guaranteed not to break the transport for one.
For another, this standard version of base-64 is very widely supported. Most web scripting languages have in-built functions for dealing with data encoded this way. ASP notably doesn't, but there are lots of free VBScripts available.
Line wrapping
For POST data you need to wrap your lines. The RFCs have, in the past, been ambiguous on this subject, but everyone has now agreed that for safety you should stick to an 80-character limit.
Although you should never have a real problem with line lengths over 80 characters, there is a real possibility that lines over 255 characters will be truncated by older network hardware. RFC 2045 says that base-64 encoded data must be wrapped at 76 characters, so this is what my code does unless you tell it otherwise. Better safe than sorry.
If however, you are simply using base-64 encoding to get around the shortcomings of FileIO, then you can save a small (very small) amount of disk space by turning off line wrapping. This does not cause any problems with decoding.
In fact, any RFC 2045 compliant decoding function will decode data with or without line-breaks. The line-breaks are required by the RFC as part of the wider standards for sending internet messages, but if you are not sending the data over the internet it becomes irrelevant.
URL-safe Base-64
Now, it turns out that RFC 2045 base-64 has a minor shortcoming when it comes to the web. That is that when used with HTTP GET, there are three characters that require additional URL encoding. This is obviously not desirable if you want to include small amounts of encoded data in a querystring. However, I have found a better workaround than encoding the data twice.
With the exception of line-breaks, there are only three offending characters in the possible output of base64. It also turns out that there are exactly three characters not used elsewhere that are also safe for use in URLs. Therefore, all that we need to do is replace the three unsafe characters with the three safe ones, and remove the line wrapping.
With GET data the line-wrap problem is slightly different than for POST. URLs longer than 80 characters are common, but you may still have problems with lines exceeding 255 characters.
One way or another you cannot put a newline in a URL, so you have to turn off line-wrapping. You must simply be conscious that the final output does not result in a URL that is longer than 240 characters, excluding the domain name.
The reason for the 240-character limit is because HTTP GET requests are printed with the URL on a single line, plus a number of extra characters. The domain name is not placed on that line though, it goes elsewhere. The total length of a line cannot exceed 255, so after taking off the extra characters we are left with 240.
To decode "URL safe" encoded base-64 on the server you will need to make some minor modifications to the decoding functions. If you are using your own scripts then the change is very simple, and should be obvious by looking at the Lingo source code. If you are using PHP, then you can simply use the following function:
function urlBase64_decode( $string )
{
$string = str_replace( '-', '+', $string );
$string = str_replace( '_', '/', $string );
$string = str_replace( '.', '=', $string );
return base64_decode( $string );
}
Summary
To conclude, these are the rules of thumb:
Short chunks of data can be put in the querystring with "URL safe" encoding and no line-wrapping.
Whenever you use GET, ensure your entire URL (excluding domain name) is less than 240 characters. Also, if you are sending new data with GET, as opposed to requesting data, then make sure you use a cache-buster (put the milliseconds in the URL).
If you need to send more data over the Internet, use POST and RFC 2045 compliant encoding (the default).
If you don't care about the line-wrapping and want to save space (i.e., saving with FileIO) then you can use either version with line-wrapping disabled. In this case, standard mode is recommended because any base-64 decoding function will decode it, regardless of whether it has line-wrapping or not.
|