Thoughts on MySQL:

Calculate the distance between two U.S. zip codes

Every once in a while a Web developer gets the opportunity to use those math skills he honed in college.  (I suppose this is only theoretical, since I didn’t go to college.)

We’re working on a project that calls for restricting search results to those occurring within a fixed distance from a user-supplied zip code.  The premise is simple enough: discover the geographical location of two zip codes and calculate the distance between them.  Of course, there are a few prerequisites:

  • a database of geocoded zip codes
  • a formula for calculating the distance between them

I thought the first resource would be difficult to find, especially considering my budget for this project is exactly $0.  But lo and behold, much to my surprise and delight, the folks over at PopularData.com provide such a resource, and it’s free!  (As well it should be, since it’s probably based on the U.S. Census data collected in 2000 or earlier.)  They claim that the database is mostly complete, except for a few dropouts (most of which, they say, are military stations, and not too useful to our application.)

I had built an application similar to this one in the past, so I knew somewhere someone had written down the formula for calculating the distance.  Remember that it’s not just a simple point to point calculation.  No, that would be too easy.  You see, the Earth is round.  (No, really.)  And the curvature of the Earth increases the distance between two points.  Oh, if only Columbus had been wrong: think of how much gas we’d save!

In fact, the formula for calculating this distance is known as the Haversine formula. You can read more about it, if you wanna; but it basically goes like this (in JavaScript):

var R = 6371; // circumference of the Earth, in kilometers
var dLat = (lat2-lat1).toRad();
var dLon = (lon2-lon1).toRad();
var a = Math.sin(dLat/2) * Math.sin(dLat/2) +
Math.cos(lat1.toRad()) * Math.cos(lat2.toRad()) *
Math.sin(dLon/2) * Math.sin(dLon/2);
var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
var d = R * c;
copy code

For our application, we basically want a view of our zip code data that represents the set of zip codes within X miles (or kilometers) of a given zip code origin.  We’ll use that view to filter our other records before returning those to the end-user.  To achieve this, I’ve written three MySQL procedures:

  • function km, which performs the calculation between two sets of latitude and longitude coordinates
  • function miles, which relies on km to do the calculation, and then converts the result to miles
  • procedure inside, which accepts two parameters: a zip code origin, and a maximum distance with units (km for kilometers, and mi for miles)

So, to query those zip codes within a ten mile radius of my home, the SQL looks something like this:

CALL inside('22601', '10mi');
copy code

And the results are a bit like this:

"22601",39.1697,-78.1686,"WINCHESTER","VA",0
"22604",39.1676,-78.1686,"WINCHESTER","VA",0.142347455024719
"22655",39.1634,-78.2462,"STEPHENS CITY","VA",4.17041921615601
"22656",39.2137,-78.0901,"STEPHENSON","VA",5.17398071289062
"22602",39.1501,-78.269,"WINCHESTER","VA",5.53605270385742
"22603",39.264,-78.1989,"WINCHESTER","VA",6.70094060897827
"22638",39.2369,-78.2885,"WINCHESTER","VA",7.90875339508057
"22624",39.2719,-78.0998,"CLEAR BROOK","VA",7.94622468948364
"22622",39.2543,-78.0664,"BRUCETOWN","VA",7.98962259292603
"22611",39.1357,-77.9919,"BERRYVILLE","VA",9.72859001159668
copy code

To increase the usefulness of the result set, we sort the results of the stored procedure by their distance from the origin: least to greatest.

I hope this turns out to be as useful for you as it is for us.  If you’d like to get a head start, feel free to download our source files:

PHP ORM with annotations

Update An update at the top of a post?  Yeah, well, the thing is, I decided to terminate the sfEzModel project.  For one thing, I’ve started thinking Syfmony ain’t exactly the best PHP framework on the block.  Furthermore, I don’t think this particular solution deserves to be stuck inside a framework. It should be free: standalone, or dropped in a framework, whatever.  So I’ve replaced it with a different project: scottlib.

Named after my best friend, scottlib is intended to be a SQL abstraction tool that even a designer could easily incorporate into his wireframed project.  That’s a tall order to fill, but along the way I’ll be doing some cool stuff too, like emulating GORM, and making it easy to switch between MySQL and Amazon’s SimpleDB.  Hang around, it’s going to get cooler.

I recently started our first Google Code project – sfEzModel: a PHP ORM library with annotation configuration, optimized for MySQL and designed for easy integration into Symfony Web applications.

“What?” you ask? ORM stands for Object-Relational Mapping and “is a programming technique for converting data between incompatible type systems in relational databases and object-oriented programming languages” (Wikipedia).  Basically, ORM is an attempt to make-easier the process of persisting (storing) object data – data that has an explicit form or type yet doesn’t readily lend itself to the scalar data types of an RDBMS like MySQL (or SQLite or SQL Server).  If you’re still in the dark, check out the Wikipedia article.

“Why?” you ask? Well, first and foremost: writing libraries is fun (for me).  Writing a component to be incorporated into larger applications is like making bricks: without finely made bricks, no building can withstand the test of time.  It’s an activity that lends a great sense of satisifaction, and I suspect that in this day and age it will grow some grassroots (read: free) marketing for our business. Collegeman.net should have its own showing of open source wares: concrete software to represent our capacity for development and a repayment for the many contributions by the open source community to our own efforts.

The most practical reason is that I don’t like what the community has to offer in the way of PHP ORM.  There are several features of the PHP programming language that lend much speed to the development process.  My goal is to embrace those features, generate less code, and create a sound framework upon which applications can be more rapidly constructed.  Initially we’re going to be committed to getting the framework up and running with MySQL and Symfony.   Future iterations will open the library up to inclusion within multiple frameworks and for persistence into multiple RDBMS platforms.

You can follow the development of this new library on our Google Code wiki at http://code.google.com/p/sfezmodel.  Presently we are planning our first alpha release for October 26th – my birthday!

Subscribe to Perseverance Trumps Talent