Advanced Web Application Security Training

In September I had the opportunity to attend a two-day training about security in web applications held by Securitum. I thought it would have been an excellent opportunity to gain some more insight on the best practices to follow while working on the Django web application around which my job revolves every day.

The training was structured in sections, one for each of the several vulnerabilities we analysed, where each section was opened by a theoretical explanation and later followed by a “hands-on” exercise where we were required to exploit the flaw we had just learned about in order to bypass the security of a fictional system and gain access to some information we should not have been able to reach.

Among the vulnerabilities I learned about during the training, I was particularly struck by the XML External Entity one. If you are not familiar with this, it basically revolves around the possibility offered by XML to define your own entities in documents, like in the following code fragment:

1
2
3
4
5
6
7
8
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///dev/random" >]>

<foo>
&xxe;
</foo>

The possible values after the SYSTEM keyword can be:

  • a file on the server, whose contents will replace the entity;
  • a directory on the server, whose file listing will replace the entity;
  • an HTTP request, whose response will replace the entity.

Since most of the XML parsers have the option for parsing and resolving external entities activated by default, it could be an easy attack vector for getting the contents of some file or service we should not be able to access remotely, as in the screenshot below:

Another vulnerability we analysed is the famous pickle module in Python. This module is able to serialise and deserialise Python objects as byte streams using a specific stack-based language: basically the output of the pickle operation is a program which is then able to recreate the serialised Python data structures. In this case, the problem arises when an attacker is able to modify the representation of a pickled object to inject code that gets executed remotely when Pickle deserialises the object. For example, in the code fragment below we are telling Python to call the os.system function with echo hello world as a parameter, which will print the string hello world on the server:

1
2
3
4
cos
system
(S'echo hello world'
tR.

The last vulnerability is a variation of the NoSQL injection affecting MongoDB, which allows you to run arbitrary JavaScript code on the server for some of its operations (e.g., in the $where clause in a query). This obviously can be used as a vector attack for accessing records we should not have access to. For example, if on the server we have this code for querying the users table in our database

1
db.myCollection.find({ active: true, $where: function() { return obj.username == $user && obj.password == $pass; } });

and we send a request to the server formatted as following

1
https://www.example.com/query?user="admin";}});//&pass=

this gets translated in MongoDB as the following query

1
db.myCollection.find({ active: true, $where: function() { return obj.username == "admin";}});// && obj.password == "" ; } });

which could possibly allow us to login as the ‘admin’ user (if there is one in the table).

Notes From Uncle Bob's Clean Code Videos

Over the last few weeks, I have had the opportunity to dedicate a couple of hours of my Mondays to watch the “Clean Code” video series by Robert C. Martin (more informally known as Uncle Bob). In the past, I have tried to read the book from which these videos are produced, but I always ended up not finishing it due to the usual excuses, like lack of time, procrastination, etc. Having a weekly meeting organised at my workplace to watch the videos together and have a discussion about them with my colleagues afterwards has been a vastly superior plan to my own, and indeed this made me stick to watch all the available videos.

During the sessions I have jotted down some notes on the topics I thought were more relevant and applicable to my day-to-day job, and I am going to report them in the remainder of this post:

Clean Code

  • The messier the code, the more time it will take to add features later in the project’s codebase.
  • It’s not that programmers aren’t working hard, they are; but the code is slowing them down.
  • Adding more people at the end of the lifecycle of a project is not going to make things go faster; quite the opposite.
  • The big redesign is not going to work either; it will start a turtle-and-hare race between the old and the new systems.
  • A rigid system is a system which resists changes, in a way that makes changes very difficult to accomplish and makes estimates not accurate.
  • A fragile system is a system that starts to malfunction in different ways as soon as some part of the code changes.
  • An inseparable system is a system for which reading the code is not going to help in understanding the original author’s ideas and purposes.

Names++

  • Every time you name something, the name should reveal your intentions. If you need to write a comment after the name, the name is not good enough.
  • If you have to go look in the code to understand a name, the name has failed its purpose.
  • To disinform means to have a name that gives the reader a different intention than the author’s.
  • There’s no need anymore for prefixes for types (the Hungarian notation), the IDE and compilers are smart enough to take this for you.
  • Boolean variables should be named as predicates, e.g. isEmpty.
  • Method names should be verbs, as they express actions (except the methods that return boolean values).
  • The longer the scope of a variable, the longer its name should be (if the scope is short, the variable name is used very near to the declaration, and hence it’s easy to look for what it stands for).
  • Methods which are called from many places should have nice and short names; private methods which are called in very few places and with a small scope should have long and detailed names.

Functions

  • Functions are the first tier of code containers.
  • A functions should do one thing, do it well, and do it only.
  • Today’s computers and compilers are so good at their jobs that it’s no longer needed to worry too much about efficiency, so we should focus on readability first.
  • Long functions are where classes go to hide.
  • If a function operates on multiple levels of abstraction, it’s doing more than one thing.
  • You can make your function do one thing only by extracting functions from it until you can’t.

Function structure

  • We should treat each argument as a liability, not as an asset.
  • Usually we should strive for 0 to 2 arguments; with 3 it already becomes a problem to remember the order.
  • Most times we pass a boolean argument to a function, we are declaring that the function is actually doing two things: one for the true case, and one for the false case.
  • If a function expects a null value, it’s like the boolean case: we expect to behave in a different way when the object is null, and in another way when the object is not null. Write two functions instead.
  • The stepdown rule says that public, more general methods should go at the top of the class, while private, most detailed methods functions should go at the bottom. This creates a new level of abstraction each time we scroll down the code.
  • Interfaces allow to invert the source code dependency between two modules A and B, so they are not required to be deployed together anymore.
  • When we have a plug-in structure, it allows the dev team to work independently on different parts of the code. Switch statements and long chains of if-else statements break these premises.
  • Functional programming is the programming style which does not allow assignments and side effects. This makes functions “pure” mathematical functions, so that the same inputs will always result in the same output.
  • Temporal coupling is when the order of function calls matter.
  • Queries (e.g., getters) should not change the state of an object and just return a value; commands should always return void, possibly throwing exceptions for error cases.
  • Individual functions should have a limited knowledge of the overall system.
  • Structured programming says that a program should composed from three basic blocks: sequence, selection and iteration.
  • Error handling is important, but if it obscures the logic, it’s wrong.
  • Always use unchecked exceptions (i.e., make your exceptions derive from RuntimeException).
  • Most of the work should be done by the name, the context and the scope of the exception thrown, so that no message should be needed.
  • Functions should do one thing, and error handling is one thing.

Form

  • Comments should be rare, and supposed to take our whole attention, and readers should be glad it’s there.
  • It should be a prerogative of a developer to write code that expresses itself. In this light, comments can be seen as failures.
  • Over time, comments degenerate into lies because usually the code changes but the comment does not.
  • Worthwhile comment are:
    • legal comments
    • informative comments (e.g., the format a regular expression is trying to match)
    • TODO comments
    • public API comments
    • expression of intent comments
  • Bad comments include:
    • mumbling
    • redundant explanations
    • mandated redundancy
    • journal comments
    • big banner comments
    • closing brace comments
    • attribution comments
    • HTML in comments
    • non-local comments
  • Formatting is about communicating, and communicating should be the first concern to programmers.
  • Things related to each other should be vertically close to each other. Vertical distance is a measure of how related two things are.
  • Use the IDE reformat feature on little snippets, not on the whole file, otherwise merges would become a nightmare.
  • Polymorphism is the key to independent deployability and plug-in architectures.
  • Data structures have public variables and (virtually) no methods, while classes have private variables and public methods.
  • Data structures and switch statements are as related as classes and polymorphism are.
  • Classes protect us from new types, but suffer from new methods; data structures protect us from new methods, but suffer from new types.
  • A row in the database is mapped to a data structure, not a class. This is called the impedence mismatch.
  • The rule of boundaries states that, when identifying a boundary, the information should go away from the concrete and toward the abstract.

Test-Driven Development

  • A test suite with good coverage eliminates the fear of change.
  • Tests let you clean your code.
  • The three laws of Test-Driven Development are:
    • you should write no production code unless you have a failing test first
    • you should stop writing tests as soon as you have a failing one
    • you should write only the code that makes the failing test pass
  • Writing tests before the production code makes the production code testable (or decoupled).
  • Creative work requires iterations and rework.
  • The tests test the production code, and the production code tests the tests.

Advanced Test-Driven Development

  • Refactoring is something you do continuously; you never ignore it, you never let it slide, you also don’t put it on a schedule or a plan.
  • To prepare a working environment to work in, focus on writing an empty test and make it pass.
  • Ideally you should write all the degenerate tests first; if that’s not possible, make sure you write the next degenerate test as soon as possible.
  • Every unit test should have only one assertion in it (Single Assert rule).
  • According to the Triple A rule, every unit test should be divided into three sections:
    • arrangements
    • act
    • assert
  • A single assertion is meant to be a single logical assertion, not a single “physical” call to the assert method.
  • As the tests get more specific, the code becomes more general.
  • Tests cannot fully constrain a program; they can add constraints but they can’t specify the final behaviour. Tests can prove a program wrong, they can never prove it right.

Architecture

  • Architecture is the set of decisions that serve as the foundation for a software.
  • A good architecture “screams” intention.
  • A good architecture allows you to postpone decisions about DBs, frameworks, etc.
  • A use-case is the formal description of how a user interacts with a system in order to achieve his/her goal.

The Transformation Priority Premise

  • Transformation is a change in code where the structure does not change, but the behaviour does.
  • The usual “Red-Green-Refactor” cycle might then change into “Red-Transform to Green-Refactor”.
  • Usually transformations make the code more general.
  • Duplicate the code is always specific, never general.
  • Applying the transformations in the priority order helps to come up with better algorithms.

My First Contribution to Open Source!

Hello again!

Since I started my new job last June, I have been fulfilling one of my long-standing desires: to learn everything I could about the Django framework. Unsurprisingly, this craving has been postponed many times by several factors, but now that I am getting my hands dirty with it every single day, I can really acknowledge the admiration and regard you can easily read about it all over the Internet, and I must say I agree with it (except in some rather peculiar circumstances :P)!

Anyway, to understand better what’s going on and why things work in a certain way (and why some others don’t), I thought it would be a good idea to find some Django projects on GitHub, read their code to see how they organized it, and maybe apply some of the knowledge I have been acquiring by solving some of the bugs reported by the end users.

One of the first Django-based projects I laid my eyes on is the official website for Python, so without reading much of its code (I am still in the process of reading it thoroughly for the first time) I jumped immediately in the issue tracker to see if there was one simple enough to solve for a beginner like me. In fact, I found one that seemed quite easy to start with:

The layout is behaving in a quite bizarre way here

It was quite easy to see that something was not going on well with the results of search for events: the strings ‘From at’ and ‘though at’ are collapsed one on top of the other, they are not where they are supposed to be, and there is no indication of the start and end date and time of these events.

What was a little bit less easy to see for a beginner was:

  • there is an easy way to setup a development environment to start working on the problem but it relies on Ansible, which is not available under Windows (which is installed on my laptop) unless you install Cygwin and a whole bunch of packages, and after doing all of this I was still not able to spin up and provision the VM auto-magic-ally;
  • there is a manual way to setup a development environment (which I ended up using), but it took a lot of trial-and-error cycles to understand that I had to download the 14.04 version of Ubuntu, some older libraries (as the project is based on Django 1.7), and so on and so forth;
  • the search is implemented through Haystack and Elasticsearch (both of which I have never used in the past), so it took me some time to install and configure them properly.

After a week of understanding and failed attempts, I figured out that the problem with the positioning of the two strings was caused by the reuse of a CSS class designed for another page (in which, effectively, they are displayed in the left side of their container) and that, in the case of the missing start and end dates, the problem was that they were being retrieved in the wrong way from the Haystack results. So, without further ado, I went on and created a pull request… in fact, my first pull request ever!

Since it was so well-received from the project’s mantainer, I felt motivated to try go through the process once again… but this time I aimed a little higher: I chose to work on an issue reported by Guido van Rossum himself! The misbehaviour in this case was that, when the width of the browser window was under a certain value, a spurious portion of the blue background was shown, and this also caused the horizontal scrollbar to appear. One of the contributors suggested that it was likely to be caused by the implementation of the dropdown menus and, after some investigation in the CSS definitions, I was able to confirm that this was indeed the case:

Interestingly, the issue appeared only when the dropdown menus associated with the Socialize and Sign In were hidden, and would promptly vanish as soon as the dropdown menus were shown. From this observation, I simply changed the way in which these menus are hidden (removing them effectively from the page layout), so that the problem could not (and would not) show up by design.

After some testing and with the blessing of the Benevolent Dictator For Life, my pull request was finally merged and deployed onto the production server!

TL;DR (long story short, in today’s Internet lingo): I fixed a couple of bugs in the official website of Python, I enjoyed the whole process, the reception was better than I expected, so I will probably do it again in the near future.

AWS Technical Essentials

Hello! My name is Stefano, and this is the first post I get to write on a blog, ever! I waited so long before sitting down and taking some time to create my personal blog and write this post for the usual “Who in the world would care about another programmer’s blog?” and “I don’t think I have anything interesting enough to be published online…” thoughts that I guess wander in the mind of someone about to do it, but I figured out it would be an encouraging and “unlocking” experience to just start with baby steps and see where this will go, so… here I am!

In this first post, I would like to talk about an Amazon Web Services seminar I got to attend last week in the beautiful Beaufort House in Aldgate, central London.

It is called AWS Technical Essentials, as it is supposed to be a one-day overview and crash course on the most important services offered by the Amazon Web Services platform, mainly aimed at programmers and software engineers completely new to or with little experience with AWS. The course was divided into both theoretical presentations and hands-on labs, which I found very useful to immediately check and employ the knowledge I had just acquired by attending the former.

Some of the notes I took (by pen and paper!) include the following:

  • Availability zones (AZs) comprise at least two regions, and are basically clusters of data centers
  • Currently there are 14 regions and 35 availability zones
  • Not all the regions have the same price
  • The edge locations are some special cache locations where end users access AWS services
  • MFA stands for Multi Factor Authentication
  • AMIs (Amazon Machine Images) are templates for EC2 (Elastic Compute Cloud) instances
  • CloudFormation is the way to go to create an EC2 template based on AMIs
  • Once you reach a steady-state service, it is far more convenient to switch to reserved instances (which save up to 75% with respect to on-demand instances)
  • Once you enable versioning in S3 (Simple Storage Service), there is no way to disable it (you can just freeze it)
  • You can retrieve earlier versions of a file in S3 by deleting the delete markers on that particular file
  • Lifecycle rules allow you to automatically move your files to a cheaper environment, based on time rules specified by you
  • EBS (Elastic Block Store) is like an hard drive for EC2 instances. EBSs are persistent, i.e. they will live even after the EC2 instance to which it’s attached has been terminated
  • Everyone is responsible for security
  • The best thing is to build security together with your application, not adding it later on
  • Federating users means to allow users from other services/locations to login and authenticate
  • Roles and groups are not the same thing: groups are “containers” for accounts, while roles are the entities which actually receive authorizations and permissions
  • There is only a layer of IAM (Identity and Access Management) groups, i.e. you can’t create a group within a group
  • If you don’t specify a role for an EC2 instance while creating it, there’s no way to do it afterwards, so you’ll need to recreate it
  • Authentication is gaining access to a service; authorization is being able to do stuff
  • Identity provider is for federating users (a.k.a. Single Sign In)
  • Vertical escalation means to give more CPU, RAM, disk, or any other resource to a single instance of a service; horizontal escalation means to spin up a new instance of the service and split the load
  • Durability means “is my data going to be there?”; availability means “is my data available now?”
  • ELB (Elastic Load Balancing), CloudWatch and Auto Scaling form an environment for dynamic growing/shrinking web applications
  • Sticky sessions are requests which will end up always at the same server
  • User data is a script which can be run just after the EC2 instance has been created
  • Launch Configurations are similar to CloudFormation scripts, but include also auto scaling information
  • Trusted Advisor is a service which can audit your instances and report about performance, costs, etc.
  • Trusted Advisor cannot change settings on your behalf, you have to manually change them yourself

Wrapping up, I really enjoyed this brief but intense course, I think it gave me a nice overview on the de facto standard platform nowadays, and therefore very useful and interesting to know something about. I would like to thank again a lot Ryan Little and Arlen Vartazarian, which have been my patient and engaging instructors, and say that I hope to meet you again soon!