LRPdesk

Design, Construction, Meanderings

Notes from a small code base

Since I never got around to talking about many of the things I had done in the LRPdesk PHP code base, I thought I’d give a rough-and-ready run down of a couple of points, partly because of similarities to things that were being discussed in the office today, and it’ll help to organize my thoughts.

Preserving form parameters across logins

A couple of months ago I talked about the PHP auto load function. This would have to be included by every “page” file in the system (we wouldn’t rely on the auto_prepend_file mechanism because we can’t guarantee access to the relevant web server configuration files on all hosts and putting an auto_prepend_file directive at the top of a “page” file rather than a “require_once” directive is less readable). We can use this always-included file to help deal with one of the issues with a system that makes use of user logins; sessions timing out too quickly resulting in lost work.

For many reasons a session can be useful, and one of the main ones is to keep a user logged in to a site so they don’t have to re-authenticate for every action. When a system is dealing with sensitive data these are often expired after a period of inactivity, so that if a user forgets to log off the system will automatically log them off, reducing the chances that someone will later happen by and use the user’s credentials. However there is a down side; if a user takes a long time to finish using a page then they could be logged off in mid-activity. Consider a form with a “details” area that’s a text box of unlimited length and a session time out of twenty minutes. The user could spend 30 minutes giving details, and on submitting the page be told they have timed out. Under some circumstances hitting the back button to retrieve the text that had been input (and save typing it all in again) will fail to make that text available; half an hour of work would be lost.

By putting logic to handle logins in our “prepend” file, and by using a convention in the code base with respect to form variable names, we can easily:

  • Perform basic form variable sanitisation (so it doesn’t need doing throughout the code base)
  • Force the user to log in again
  • Preserve the form variables (the “details”) across mutlitple login attempts until one succeeds or the user navigates away.
  • Perform session “upkeep” to mark sessions as continuing to be active.

Here’s what we’d add to the file to approach this:

// At this point we could ensure that the web page is being visted using HTTPS
// (SSL-encrypted HTTP) by inspecting $_SERVER['HTTPS']

// get_magic_quotes_gpc could be configured in any manner; call stripSlashes on all
// form data here if it is so  we don't have to check for it elsewhere in the code base.
if (get_magic_quotes_gpc()) {
        foreach (array($_GET, $_POST, $_COOKIE) as $array) {
                foreach($array as $key => $value) {
                        $array['key'] = stripslashes($value);
                }
        }
}

// PHP session support (re)starting; needed by every page that makes use of session
// information
session_start();

// Force and handle a log in attempt:
if (!LRPDesk_Security_Logins::IsLoggedIn()) {

        // Not logged in.
        $error = "You must be logged in to continue.";

        $form = new LRPDesk_FormScrubber_Login();

        if ($form->isValid()) {

                LRPDesk_Security_Logins::AttemptLogin($form->getUsername(), $form->getPassword());

                $error = "Error logging in.  Invalid username/password.";
        } else {
                $error = "Username and/or password is invalid.";
        }

        $error .= " Session cookies must be enabled to log in.";

        if (!LRPDesk_Security_Logins::IsLoggedIn()) {
                // Either no attempt, or the attempt failed.  Generate the error page.

                $smarty = LRPDesk_Views_Smarty::GetSmarty();

                $smarty->assign('error', $error);

                // Preserve query string.
                $smarty->assign('post_url',$_SERVER['REQUEST_URI']);

                // Preserve POST paramaters that aren't used (and therefore are to be passed
                // through to the next page).
                $preserve = $form->getUnusedPostParamaters();

                $smarty->assign('post_hidden_params',$preserve);

                // Display the log in page.
                $smarty->display('login.tpl');

                // Do no further processing.  The user is not logged in.
                exit;
        }
}

Without going into too much detail about the classes we’ve not discussed yet:

  • LRPDesk_Security_Logins::IsLoggedIn() performs a check of the session information to see if the user is logged in (which will fail if the session has expired).
  • LRPDesk_Security_Logins::AttemptLogin(…) attempts, using a provided username and password, to log the user onto the system (creating a new session identifier as needed).
  • The LRPDesk_FormScrubber_Login parses the submitted login form, validates it, and makes it’s information easily available, including the names of the username, password and submit form parameters that by convention within the code base are only used for log in attempts, and a way to access (as an associative array) the form parameters that were provided that are not used for log in attempts (the “unused” parameters we’re preserving across the login).
  • We preserve the query string (”GET” parameters), and provide “hidden” input fields for POST parameters not used in the login attempts so that they continue to be provided to the receiving page.
  • The “smarty” object is used to create the login form page using a template file

Now if a session expires (or a user is not logged in in the first place) and form data is submitted to a page including this file the user is prompted to log in and, on doing so, the action they were in the middle of continues as if uninterrupted (provided it only relies on POST and GET information.

Session vulnerabilities

Every time a session-supporting browser makes a request to a session-enabled site it is given (or provides) a session identifier. These are intended to be unique, with each user have their own identifier that then makes it clear to the site which requests come from the same user. These session identifiers are valuable in a security-critical system, as a malicious user who can provide a session identifier belonging to another user who’s currently logged in would impersonate that user.

Logging a user in is theoretically largely a case of checking a provided username and password against one held in the system and storing this in the session information so we know the user has logged in and can stop asking for their password. From this point the session ID effectively holds the user authentication. To protect against session fixation attacks, where a malicious user obtains a session ID of a logged-in victim and users their session ID, there are a few things you can do. Wikipedia’s Session Fixation and Session hijacking pages cover the subject of how this exploitation happens and how to counteract it reasonably well - I strongly advise reading them.

For the purposes of this project it is probably sufficient to regenerate the session ID following each successful log in, and SSL encryption of traffic - while advisable - need not be mandatory. For a more security-critical project the session ID should be regenerated for every request and an SSL certificate must be used. While some people would also advocate locking a session to a particular IP address to help avoid exploitation this, in my opinion, won’t add any security beyond these - if a malicious user has exploited the victim’s machine to the point where they can learn the session ID from their browser they can probably send our system requests from that victim’s machine.

Of shoes, of ships, of sealing wax…

It’s been a while since I’ve written anything. Partly this is because I realised that most of my entries on this blog were meandering, and not really useful to anyone. Partly it’s because I’ve been playing games, watching TV and starting to plan some calligraphy rather than do “work”. Partly it’s because while, if I should ever finish it, the LRPdesk system might gain a handful of users it’s a “while I feel like it” project - one with neither meaningful deadlines or a paycheck incentive.

I was tempted to post tonight to describe what the Java Event Dispatch Thread is and how you should interact with it to avoid freezing your Java GUI, after a peruse through some particularly peculiar code at work written by someone who clearly didn’t quite understand it at the time. I opted against it in the end, being distracted by cooking tea, and rather tired after a day in the office.

Undoubtedly I will eventually come back to this project, but perhaps not for a while. I’ll let you know.

Code Structure and Coding Standards

Code Structure

One of the problems with PHP as a programming language is the lack of namespacing. This was due to be corrected in PHP 5, but this has slipped to PHP 5.3 (not yet released) and may slip to PHP 6. The lack of namespacing increases the risk of function name clashes, particularly if any third-party libraries are used. Even within the standard PHP libraries there is no consistent naming scheme. There is also no standardised way of structuring a code base.

One way to overcome the lack of namespacing in PHP is to package function calls as static (class) methods in non-instansiable classes. We still need to ensure a lack of clashes in the class names, but a prefix to our class names should stop this from being an issue, while keeping the length of function names manageable.

We’re developing using OO principles, and have a high-level OO design. Our code base structure should reflect this to make location of relevant classes easy during development and maintainance. This will also let us make use of PHP’s autoload feature. This automatically loads classes that have not yet been defined on their first use in the currently executing code. An autoload function I’ve made use of in the past is:

/**
 * Automatically loads a class on it's first use if it's not already defined.
 * See the PHP5 manual for details.
 */
function __autoload($class_name) {

        $valid = false;

        if (isset($class_name) && is_string($class_name) && strlen($class_name) <> 0) {

                $pattern = ‘/^[A-Z][[:alnum:]]*(_[A-Z0-9][[:alnum:]]*)*$/’;
                $valid = 0 != preg_match($pattern,$class_name);
        }

        if (!$valid) {
                die(”Class name $class_name not valid”);
        }

        $class_arr = explode(”_”, $class_name);
        $class_file = implode(”/”, $class_arr);
        $class_file .= “.php”; 

        $full_path = LRPDESK_DIR . ‘classes/’ . $class_file;

        if (file_exists($full_path)) {
                require_once($full_path);
        } else {
                die(”Fatal Error: Class file, $full_path, could not be found.\n”);
        }
}

This defines the structure of our class library. If we have a user class LRPDesk_User it would be found in the LRPDESK_DIR/classes/LRPDesk/User.php file, while a model class LRPDesk_Model_UserStore would be in LRPDESK_DIR/classes/LRPDesk/Model/UserStore.php. Using this structure it is easy to deduce the file name from the class name, and vice-versa.

System security dictates that only those PHP files that are to be used directly by the users of the system should be within the DocumentRoot of the website. This minimises the opportunities a malicious user will have to attack our system. As well as our “classes” directory we will therefore need a “pages” directory to contain those files. Further directories for third-party components, documentation, etc. will be added as needed.

Coding Standards

There are many views on what coding standards should be, and if they are needed at all. Some standards dictate when code should be indented and the placing of braces. Other standards concentrate on the order of arguments in functions and methods. Others might be to do with the order of the initialisation of properties within a constructor. Others might be to do with when and how tests are written.

No matter what the details of a set of standards the purpose is always the same; to provide a minimum quality of code and a commonality of structure. By having code structured, variables named, and parameters ordered in a consistent fashion it is easier to dip into the code for development or maintenance.

Consistency of code is easy to appreciate one you have been exposed to the alternative. Anyone who has been brought into a project that has had several developers over a number of years without any standards is likely to have tripped up over different calling conventions or variable naming schemes. In the best cases such issues are kept isolated in individual files, but most of the time different parts of the same routines are inconsistent with each other, introducing bugs and making maintenance and further development more difficult.

Code indentation and formatting conventions are difficult. While there are tools such as indent that will reformat your code along specified lines there always seems to be an exception to a rigid set of guidelines where the code is easier to understand if they’re broken a little. Variable naming is more subject to the whims of the developer than the code structure, which by necessity reflects the task at hand.

All coding standards eventually come down to a list of dos and don’ts. The following list is based on some of my habits, and on conventions I’ve picked up in various languages which we’ll apply to PHP.

  • Logical Structure
    • if, else, and other control structures that can either user braces to execute a group of instructions or omit the braces to execute a single instruction should be written with the braces even if only a single instruction is enclosed within those braces.
    • Recursive functions should generally be avoided as many developers find them confusing.
    • In any related set of functions the order of common arguments should be consistent.
  • Comments
    • At the start of each function and method there should be a comment detailing it’s purpose, arguments and return value(s).
    • Complex pieces of algorithm used in a routine should be commented explaining what is happening and why.
  • Naming
    • Functions and variables should begin with a lower case letter and be written in CamelCase.
  • Objects
    • Static (class) methods, variables and constants should begin with a capital letter rather than the normal lower case letter.
    • Methods and attributes should have their visibility explicitly declared, and that visibility should be as restrictive as possible.
  • Indentation
    • All indentation will be done with tabs to allow easy indent resizing in text editors.