Please visit my new campsite listing site ukcampingmap.co.uk


Archive for the ‘php’ Category

Batch including headscripts and links in Zend Framework

Thursday, September 16th, 2010

A bit of a dull post to break my silence with, but after a summer away, it’s back to the grindstone of earning a living through programming. Fingers crossed I’ll be given the opportunity to work again on a website I finished ages ago, fixing all the things I did badly and adding some enhancements.

I used Zend framework for the back-end of the site and first time around I had a very rudimentary grasp of how to use it, but no real in depth understanding of its finer points. I’m now finding that I did a lot of things the long way round (eg typing out JSONs by hand  rather than using the Zend_JSON class to build them automatically). One major bugbear I had was that I was adding all my javascript and css files individually to the document head, and that the syntax for doing so in Zend wasn’t really any more concise than just hard coding <link rel=”stylesheet” ….

So now I’m revisiting Zend the first thing I’ve done is write a couple of view helpers to batch add javascript and css files, which I thought I’d share here.

<?php

class Zend_View_Helper_IncludeStyles extends Zend_View_Helper_Abstract {

   public function includeStyles($folder)
   {
     $results = array();
     $handler = opendir(getenv("DOCUMENT_ROOT") . "/css/" . $folder);
     while ($file = readdir($handler)) {
       if ($file != "." && $file != "..") {
         $this->view->headLink()->appendStylesheet('/css/' . $folder . '/' . $file);
       }
     }
     closedir($handler);
   }
}
<?php

class Zend_View_Helper_IncludeScripts extends Zend_View_Helper_Abstract {

  public function includeScripts($folder)
  {
    $results = array();
    $handler = opendir(getenv("DOCUMENT_ROOT") . "/js/" . $folder);
    while ($file = readdir($handler)) {
      if ($file != "." && $file != "..") {
        $this->view->headScript()->appendFile('/js/' . $folder . '/' . $file);
      }
    }
    closedir($handler);
  }
}

Cooking with SQL

Friday, February 26th, 2010

SQL was the sight of my first forays into programming, back in the days when I managed records for an educational project and came to the heretic conclusion that MS Access was better suited to the task than Excel. But that’s more or less where my learning of SQL stopped, and even then it was limited to SELECT, WHERE and ORDER BY statements (I let Microsoft’s wizards do all the hard work of building multiple table queries).

Fast forward to yesterday and I decided to finally bite the bull by the horns. The back-end of my latest educational endeavour was, I guessed, suffering in speed due to the fact that I made no use of JOIN on my MySQL tables; each time I wanted to get records related to records in another table I would use nested loops in php to get the related records for each row.

In my defence, I largely chose this approach due to another 3 major flaws in  Zend Framework\’s documentation:

  1. It doesn’t explicitly mention that when you build a Zend_Db_Select query based on a Zend_Db_Table class (ie a Model) the FROM clause of the query is automatically filled in. Attempting to fill it in yourself causes an error.
  2. It doesn’t mention anywhere that in order to use JOIN within a Zend_Db_Select query based on a Zend_Db_Table class you need to use the ->setIntegrityCheck(false) method. Without this all manner of confusing errors occur.
  3. This wording: “You can not specify columns from a JOINed tabled to be returned in a row/rowset. Doing so will trigger a PHP error” , coupled with the fact I was getting lots of errors led me to believe that Zend had neglected to add JOIN functionality to its models, so I dropped that line of attack. (In fact, all that quote means is that you cannot change the default behaviour, which is to fetch all columns).

But now I have of course overcome all this, and have managed to eg reduce about 30 lines of code (get a teacher’s classes, then get these classes’ assignments, then get all the attempts at these assignments, and for each of these get the individual puzzle solutions submitted) to a single line using 3 RIGHT JOINs and one INNER JOIN. It’s much neater, and I can only guess at the vast improvements in speed it brings; my guess is “oodles”.

As well as improving my application, learning about Zend_Db_Select (the documentation to this is remarkably well written, considering the surroundings), via the __toString() method, has increased my understanding of the underlying SQL to the point where for the first time I can write non-trivial queries from scratch – eg updating a field based on a join with another table – , which is a great addition to my programming armoury.

How I’d teach myself programming, if I could do it all over again

Wednesday, February 17th, 2010

As I mentioned the other day, web development as a career is rare in that you can just pick it up off the internet unlike, say, luxury pet grooming. But it took me a while to find good strategies to learn what I wanted to learn. If I had a time machine I’d go back and tell myself the following*.

Don’t buy beginners guide programming books

I bought a couple of these on php/mysql and javascript. I won’t say they were a complete waste of money as I did learn the basics from them, but there are several good reasons not to rely on them for your starting point:

  1. These books are almost always a long-winded, incomplete version of the programming language’s documentation (which is normally available for free online) structured around building an example application which probably bears little resemblance to something you would like to build, e.g a quiz about the Simpsons
  2. Unlike online resources these books are not searchable with lots of easy to follow cross-references
  3. Online tutorials are more up to date

Find a good online tutorial

For any programming language there will be loads of beginner’s tutorials online; just search Google for “[language name] beginner’s tutorial”. the top results won’t however generally be the best, so open up lots of tutorials in lots of tabs, narrow it down to a few of the best and then bookmark them, before starting to follow one of them. If you get stuck on a section you can always try the explanatiosn given in your other bookmarked tutorials or search google for “[programming language] [topic] explained”. Below are some of my favourite tutorials:

Learn to use documentation

It took me a long time to realise that most programming language documentation follows  the same structure, and once you understand this you are able to teach yourself any language. Roughly, a programming language (at least, the ones I know) is a collection of types of thing (objects, strings, arrays, numbers etc…) and processes (loops, conditionals, functions) for manipulating things, and some things have built in sub-things (properties) and their own dedicated processes (methods), and most processes will only work on certain types of thing (arguments of the correct type).

Well written documentation will list all the above information systematically (together with the basic syntax and rules of the language), so that if you create a variable of a certain type you can find out what you are able to do to it, or if you want to use a function you can find out what conditions its arguments need to meet. An understanding of object oriented programming also goes a long way to being able to grasp documentation, but isn’t essential for a beginner.

Use libraries… lots of libraries

Not the ones with books. A library (sometimes called a framework) is a collection of software written by somebody else that takes care of some tedious/difficult processes for you. The classic example at the moment would have to be jQuery. Without jQuery the differences between browsers’ implementations of javascript would make developing javascript web applications a specialised and difficult task with unreliable results. Because jQuery is a collection of code that thinks about all the cross-browser differences for you (as well as doing lots of other useful tasks) creating reliable javascript applications is now something even beginners can take on. Some libraries also have thriving communities that build plug-ins to extend the functionality further.

And to make use of all this all you have to do is include a file (or collection of files) and get to grips with the library’s documentation (often called an API – Application Programming Interface) which, no matter how daunting it may seem at first, is guaranteed to be easier than writing all the code yourself.

Invest in some expert/advanced books

Beginner’s guides may have been made redundant by the internet, but there is still room for more advanced books. Yes, the information is probably on the internet somewhere but structured tutorials aimed at more advanced users are far less common than beginner’s tutorials. I won’t recommend any books myself as I don’t consider myself enough of an expert to judge, and I also don’t own many yet, but the ones I do have are full of techniques I couldn’t have worked out for myself.

And that’s how I should have done it!

*Like hell I would. Straight to the bookies it’d be.

Deletious

Sunday, February 14th, 2010

Well, this has to be the quickest I’ve ever gone from idea to publishable (albeit limited functionality) website.

Deletious is my new site for simultaneously viewing a page bookmarked in Delicious and deciding whether to keep or delete the bookmark. I’ve had quite a lot of fun using it the last few days, rediscovering all sorts of articles, games, tools and other long forgotten sites. As well as wasting a lot of time reacquainting myself with all these I’ve also managed to de-clutter my Delicious account; all the CSS articles from 2-3 years ago giving an introduction to topics I now know inside out are gone from my bookmarks, as are all those gimmicky websites I can’t believe i found funny at one time.

Disappointingly, I’m having problems uploading the logo to the website’s folder, but it’ll be sorted sometime soon I hope.

So please do give it a go and let me know what you think.

EDIT – There’s a bug that pops up every now and then (something to do with caching) which leads Deletious to show zero bookmarks for your account. I’ll fix the bug when I get time, but waiting a few hours seems to clear the cache (at least, it works for my account) and then you can access your bookmarks again.

Delicious, though not so easy to swallow

Thursday, February 11th, 2010

For a long time I’ve wanted to work with the Delicious API. Initially it was because the Delicious website not only had the difficult to remember del.icio.us url, but was also very badly designed. If you compared its progress – addition of new features, cleaning up of design, making use of new techniques suchas AJAX – with its web2.0 compatriots (Flickr, Digg, boris-johnson.com) it lagged way behind.

So I initially planned to build a new front-end for it, making it easier to work with your bookmarks, but before I could progress far enough in my coding abilities they completely redesigned the site; a vast improvement.

Though still not perfect. For a while I’ve found it frustrating that there is no easy way to simultaneously see the content of a bookmarked page and delete the bookmark if you deem it no longer useful, so my delicious account gradually got more and more cluttered. Well, this afternoon I decided to do something about it (and not just because I’m avoiding doing more important stuff).

But I was foiled for a long time by the laziness of the Delicious developers. My initial plan was to use javascript to get a JSON of all my bookmarks (or alternatively request one at a time) and go through them one by one, displaying the webpage in an iframe, and offering the option to discard or keep the bookmark. However, delicious only publish this data as XML which means, due to cross-domain restrictions on AJAX, you can’t just use javascript. I may be a bit hasty in pinning this on developer laziness, but I imagine creating alternate templates (because that’s all the difference between JSON and XML really) wouldn’t be too time consuming, and would greatly enhance the versatility of the API.

Anyway, I realised I would have to use a bit of PHP to get the XML and create pages from which my javascript would be able to access the data. Luckily, before I dived straight in I came across phpdelicious (which, appropriately, I have now bookmarked in Delicious) , a very easy to use php class for wrapping the Delicious API, which is very handy indeed. Less than an hour later I had built exactly what I wanted.

I reckon a few more hours development and I can make it a publicly available service.  All I need to do is include a form for other users to be able to login, and (ideally) preload websites in the iframe to speed things up (though this is problematic as some sites force the whole web page to be redirect if you try and put them in an iframe).

A greasy framework

Monday, December 14th, 2009

As I think I may have mentioned, the latest project I worked on I’ve been using the Zend Framework for all the server-side development. Over the last few months I’ve developed a love hate relationship with it. On the plus side it does pretty much everything I need without needing too much customisation, a few of the negatives though are:

  • The quickstart in the documentation assumes way too much background information about configuration of a php app and data sources etc. They don’t seem to have considered that a reason many programmers use zend is because they don’t know too much about back-end development and want something to take care of the tricky bits for them. It took ages to get past this stumbling block, with the help of this tutorial (WARNING: The bit on connecting to the database either uses quotes when it shouldn’t or vice versa) and this website with tutorials on various zend components.
  • Having said it does everything, there are lots of gaps. I’m sure a lot of thought goes into deciding what gets included and they must get all sorts of requests, but some simple standard things are missing, eg a validator to check a confirm password field. Nevertheless it is fairly easy to write extensions, but the zend documentation site should have a lot more and clearer information on this. Linked to this, there seems to have been very little thought put into building some sort of community to share plugins, unlike jQuery, for example.
  • You may have sensed that I don’t have a great deal of respect for the documentation. It’s very sloppy to be frank. There are so few cross-links to different sections, and a lot of the classes contain examples which I found fairly irrelevant, covering a way of doing things unlikely to be used in a real website (eg the examples for querying your database don’t really use the Model-View-Controller structure you’re supposed to use). And, in my view, it’s way too wordy, and much of the text is just waffley clutter; far more so than better examples of documentation, such as php and google maps. The website is also very difficult to navigate.

To fix the final gripe I’ve written a greasemonkey script (my first ever one) to replace the existing documentation layout with one a little easier to negotiate.

Before greasemonkey

Before greasemonkey

Before greasemonkey

After greasemonkey

Clever stuff with tables

Monday, September 28th, 2009

I’ve recently been getting to grips with the zend framework. I’ve been meaning to blog about it for a while, for there is much to discuss: appalling introductory tutorial, a class reference which for some reason is nowhere near as easy to use as others… but I will touch on all that some other time.

But I thought I should post this up before I forget. The site I’m working on at present has three kinds of users: superusers, teachers and students, with a separate database table for each kind. The tables for each could have been slightly different but I decided to make them all the same (with dummy entries in the few irrelevant columns, though later I may discover I can discard these). The reason for keeping the structure uniform was that I had an inkling that if I did I could use just one model in Zend to access all three tables… and the inkling was correct.

It took a little debugging and investigation of the Zend_DB_Abstract class, so for the benefit of others, to have a model that works for a number of tables simply start your class defininition as follows:

class Model_DbTable_GenericName extends Zend_Db_Table_Abstract
{
   protected $_name = '';

   public function __construct($type,$config = array()) {
     parent::__construct($config = array());
     $this->_name = $type;
   }
   ⋮
}

And to instantiate a DB model use the following:

$Data = new Model_DbTable_GenericName('specifictablename');

Whether or not your tables have to have exactly the same structure depends on how you define all your functions for interacting with the data – you might need to use conditionals if some tables have more or less columns than others.

Geocoding in the UK

Sunday, August 16th, 2009

The art of geocoding addresses in the UK, as I previously explained, is a soul-destroying process, frought with inaccuracy, bugs and convoluted workarounds. And for all that work you end up with a set of points of which a great deal are probably somewhat inaccurate and at least some of which are completely wrong. UK addresses (and probably those elsewhere in the world) are complicated creatures, which Google’s geocoding engine often interprets wrongly.

Postcodes, on the other hand, are rather easier; there is a well-defined relationship between a UK postcode and its corresponding (usually pretty small) piece of the British countryside. But google’s geocoding api will only return a geocode for the postcode sector (ie will give a geocode for LL12 5 when you searched for LL12 5TH). However, someone did figure out a way of using Google’s local search API combined with google maps to geocode UK postcodes. Since he blogged about it the API has changed, so below is an outline of how to geocode a batch of postcodes in the UK using just some simple php, the current google ajax search API and a little javascript (jQuery isn’t essential, but cuts down on coding a bit). The javascript is the crucial step.

Assuming you have a database full of postcodes and id numbers, and 2 empty columns to store latitude and longitude values, this is how it’s done. (Download source geocode.zip).

1. Create a html page geocode.html with the following content:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >

<head>
<title></title>
<meta name="description" content="" />
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
<link rel="stylesheet" href="" type="text/css" media="screen" />
<script type="text/javascript" src="jquery-1.3.2.js"></script>
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script type="text/javascript" src="geocode.js"></script>
</head>
<body>
<div id="counter"></div>
</body>
</html>

(Make sure you specify the correct location for your local javascript files)

2. Create a php file (in the same directory), geocode.php, with the following rough structure (it will only be accessed via ajax, so is very stripped down):

<?php
require_once ('mysqlConnect.php'); //or other database connection details
if($_GET)
{
 //var_dump($_GET);
 update_record();
 send_new_data();
}

//gets the next record without a geocode and sends the id and postcode to the browser
function send_new_data() {
 $query = @mysql_query("SELECT id, postcode FROM geocode_table WHERE lat = '' AND postcode != '' ORDER BY id LIMIT 1");
 if(($query) &&mysql_num_rows($query)) {
  $row = mysql_fetch_array($query, MYSQL_ASSOC);
  echo $row['id'].','.$row['postcode'];
 } else {
  echo 'stop';
 }
}

//updates the last record with data sent from browser
function update_record() {
 $id = $_GET['id'];
 $lat = $_GET['lat'];
 $lng = $_GET['lng'];
 if($id > 0)
 {
  $update = "UPDATE geocode_table SET lat = '".$lat."', lng = '".$lng."' WHERE id = ".$id;
  $result = @mysql_query($update);
  if (!$result) {
   die('Invalid query: ' . mysql_error());
  }
 }
}
?>

3. Create a javascript file geocode.js, saved in the same directory again (I would paste it here but it keeps breaking wordpress)

4. Running the code

Once you’ve altered the database connection details, and SQL query to suit your setup, simply open geocode.html in your browser. A counter will tell you which record you’re on. To stop the code simply close your browser/browser tab.

How it all works

In a nutshell (ignoring the special case of starting off the loop) the code repeatedly performs the following process:

….in geocode.php, send_new_data() finds a record which has no latitude value and sends it’s id number and postcode as an ajax response to set_and_get_next(). This keeps track of the id in a global variable and sends the postcode to getPointFromPostcode(), which uses google’s local search to get a geocode. Once it’s found a geocode it passes it to set_and_get_next(), which sends it to geocode.php in an ajax request. There update_record()… well… updates the record, and send_new_data() finds a record which has no la….

Compared to my previous approach iterating a script over large sets of data, using ajax is very sleek. Similarly to a pure php script I can load from a browser, though with much of the resource intensive scripting taking place on my or google’s server. But with ajax there’s no problem with the browser timing out from time to time, or baulking at the number of times a page is requested. It’s a little harder to code, and probably less efficient… but I like it. And I’ll definitely be using my shiny new geocoded postcode data.

Anarchy in the UK

Monday, July 13th, 2009

This damn economic crisis/swine flu outbreak isn’t quite that bad yet, but nevertheless there is a very limited sense where the UK is quite anarchic: geocoding addresses using Google Maps.

Having completed my download of addresses for my new Google Maps website the next stage was to geocode them so that I can plot them on the map. I had no idea how tricky it would be when I started out.

The most irritating and fundamental difficulty is that geocodes for UK postcodes are not available for free. The data is owned by the Royal Mail, and there is at least one website where you can buy access to this information (it has a free trial, but I discovered that this is just for about 10 or so geocodes). You can search by postcode on google maps, but if you put a postcode e.g. LL13 7YH into the geocoder API you’re given the geocode for LL13 7 – not accurate enough to be of any real use.

So you have to go for geocoding full addresses instead. The geocodes for these data isn’t owned by Royal Mail, but by the Ordnance Survey, and for some reason they are less restrictive about sharing the information. But there’s still a long hard slog before you can get the geocodes out of this.

Google offer a really useful turorial on geocoding addresses, and this, combined with my approach to iterating over a large number of records meant I was collecting the geocodes in no time. However, it wasn’t as peachy as it seemed.

For example, the address Llantysilio, Denbighshire, UK brings up a pretty accurate geocode for the village of Llantysilio in North Wales. However, the full address, including the postal town is Llantysilio, Llangollen, Denbighshire, UK, and this unexpectedly brings up the geocode for an address on Castle Street, right in the middle of Llangollen. So a more complete address leads to a far less accurate geocode. This is immensely problematic.

In general I was feeding in the longest possible address made up out of the data I had, so in my php script I had something like the following:

while(count($arr_address > 1) && !$str_lat)
 {
$str_address = implode(', ', $arr_address);
 attempt_geocode($url.$str_name.', '.$str_address.', '.$str_county.', UK');
 attempt_geocode($url.$str_address.', '.$str_county.', UK');
 attempt_geocode($url.$str_address.', UK');
 array_pop($arr_address);
 }

This starts with the longest, most detailed address string, and then gradually cuts the string down (possibly sacrificing accuracy in order to get a passable geocode), with attempt_geocode() exiting the loop on success.

But the fact that longer addresses can lead to incorrect geocodes meant I had to work in a way to start off with shorter addresses, and if that doesn’t get a geocode then gradually shorten them and keep trying to geocode. So I’ now have:

 attempt_geocode($url.$arr_address[0].', '.$str_county.', UK');
 attempt_geocode($url.$arr_address[0].', '.end($arr_address).', '.$str_county.', UK');
while(count($arr_address > 1) && !$str_lat)
{
 $str_address = implode(', ', $arr_address);
 attempt_geocode($url.$str_address.', '.$str_county.', UK');
 attempt_geocode($url.$str_address.', UK');
 array_pop($arr_address);
}

A long process in order to get a geocode that could still quite likely be wrong, and even if it’s basically correct might not be as accurate as a postcode; but nevertheless an improvement on what I had before.

A glimmer of hope though is that google maps itself doesn’t suffer from this issue – both address versions return the same accurate point on the map, and as someone pointed out to me on stackoverflow, google maps is in beta, so maybe teh geocoder API just hasn’t been updated to the newer, better address parser, and maybe one day reliable geocoding for free in the UK will be a reality. Also, somebody has found a way to geocode in the UK using postcodes, by hacking together the google maps and search APIs, and I may well try it, as this address geocoding malarchy leaves a lot to be desired. (*edit – turns out it’s heavily reliant on javascript so can’t be used for geocoding masses of pages without slowing down your browser.)

Finally, if this article wasn’t any help, there’s loads of geocoding links here.

Learning to crawl before you can run

Wednesday, July 8th, 2009

Crawling websites for data using php running in a browser

I’ve had an idea for a website for almost a year now (won’t spill the beans just yet though) and today I finally started work on it. To lift the veil of secrecy a little, I’m putting information about certain places onto a Google Map, because somehow nobody has thought of doing it yet.

All that information about the places is already available on the internet, just not embed in a map, so my first step was to crawl some websites to get hold of all that information. A slight problem I had is that this required running a script to automatically trawl through all the pages. I only know PHP, which as far as I was aware, could only be run in a browser, and browsers time out after a while, so it would be impossible to just leave it running.

A little more research revealed that it is possible to run PHP scripts as stand-alone entities outside the browser, but only using something called CGI. I had no idea what CGI was, and my hosting company don’t allow you to use CGI anyway. But I did manage to find another solution.

Although a php script crawling lots of pages for data would cause a browser timeout, a script crawling just one page almost certainly wouldn’t. So what I had to do was:

  1. Write a script that crawls the data of one page but…
  2. … checks what page was last crawled and moves on to the next one when it starts and…
  3. … tells the script to execute again once it’s finished.

1. will differ depending on your needs, but my solutions to 2. and 3., I believe, provide a good technique to crawl web pages if you only know PHP.

Solution to 2.

Each iteration will presumably write the information to a database. Provided you’re iterating over an integer (e.g the webpages are of the form http://www.thesite.com/thepage?id=theinteger) then you’ll probably be storing that integer in your database. Then the following code at the beginning of your script will advance you to the next web page to crawl.

 $last_entry_query = @mysql_query("SELECT theinteger FROM thetable ORDER BY theinteger DESC LIMIT 1");
 if(($last_entry_query) &&mysql_num_rows($last_entry_query)) {
    $last_entry_row = mysql_fetch_array($last_entry_query, MYSQL_ASSOC);
    $last_entry = $last_entry_row['int_ukcs_id'];
 } else {
    $last_entry = one less than the first entry (starts the script off);
 }
$current_page = file('http://www.thesite.com/thepage?id='.($last_entry+1));

You can stop and start the crawl whenever you like as the database will always tell you which page to crawl next.

Solution to 3.

Really simple this one. At the end of the script you need to run it again. What better way than just redirecting the browser to the same page again.

if(mysql_affected_rows() == 1){
    header('Location: http://localhost/campcrawl/index.php?id='.$last_entry);
 }

Still one little tweak though; browsers will tend to limit the number of times a page can be redirected to in order to avoid infinite loops. In Firefox, to over-ride this go to about:config  and change redirection-limit to something really big. If there is a danger of an infinite loop you can limit the number of iterations in your script with a counter or a timeout, but for me it wasn’t a  problem as just closing the browser tab stopped my script.

The reason this technique is workable is that even though it requires a browser to run the script, with today’s multi-tab browsers, and the fact that all the calculation is done on the server, mean that it doesn’t infringe on whatever else you’re using the browser for (aside from maybe occasionally having to refresh the tab running the script as sometimes it stops for seemingly no reason, but that might just be bad programming from me).

So that’s stage one nearly completed (2,000 out of about 9,000 pages trawled so far, but at this rate should be fininshed within the hour). Now on to actually doing something with the data.