tags: address, USPS

Action executed in 0.000

Each Tag

address, USPS

Common tags - number of posts

badge (2), Mitretek (2), gym (2), bug (2), well formed (2), XML (2), XmlDoc Substring (2), data (1), database (1), GIS (1), gov (1), website (1), www (1), DC (1),

Sub groups 1

address, data, gov (1), DC, GIS, address, database (1), address, website, www (1)

Conan, What is best in life?

Had a good yesterday. Found the bug in my project (at work) that's been driving the numbers off. This morning remembered to bring my gym clothes to work and remembered to stop by Marhsall HS on the way to work. You see, i thought i left my badge and favorite pen there in the auditorium. No luck...

So arriving at work i figured i'll just check under the seat. BAM, found my pen. But no badge. Looking around further i spot it in the passenger door pocket. And suddently i realized how it got there. I placed it on some CDs on the passenger seat and as i made a hard left they slid sinking the badge into the side pocket. Score.

Today i continue work on address matching, a project i had ohhh about a year ago. I'm taking postal addresses and normalizing them into proper form. Beyond that i'm connecting two datasets based on address.

XmlDoc Substring

I wrote a useful PHP class i'd like to share. The whole purpose to to take a substring of an XML document while preserving well-formedness. I created this function because i incorporate other people's articles into my blog, but need to limit the size of summaries. Examples and source code are included below.

I wanted to create a class "myXmlDoc" as an extension of xmldoc in PHP, but apparently xmldoc is not a class, just a collection of functions.

This is how xmlDoc is used.

$str_Doc = "<person><name>Brian</name><hobby>writing code</hobby>";
$XmlDoc = new xmlDoc( $str_Doc );
if ($XmlDoc->isWellFormed())
   $SS = $XmlDoc->substring( 30 );

Here's a larger example.

Here's the source code licensed under GPL.

One thing this code doesn't do is maintain validity. Consider that an address element is composed of mandatory street, city, and state elements. This function would cut the state element first, then city, and finally street if you wanted a short enough substring.

Tangentially, i would represent a postal address like i represent time in a hierachial structure, like this.

<postalAddress>
  <country value='USA'>
    <state value='MD'>
      <city value='Rockville'>
        <zip value='20742'>
          <street value='Ruth Lane'>
            <number value='169' />
          </street>
        </zip>
      </city>
    </state>
  </country>
</postalAddress>

Each sub element is optional. This allows you to have arbitrary precision. Using this method, a substring will maintain validity in addition to well-formedness. If you are forced to omit data, the first to go are the finest details. Saying you live in USA has more meaning than saying you live at 169.