Friday, September 28, 2007

Creating a secure webauth system: Part 1 -- HMAC

This is the first in an n-part series about web authentication for a system where user identification and attribution is important, but content protection is not. This entry assumes that a secure method has been used to negotiate a shared secret -- as the result of username / password authentication over https, for example.

Obviously the user login / account registration portion of a web auth system will require some secure connection, but once that authentication is completed we'd like to make use of a more efficient open protocol. (eg: *http* vs. *https*). There are many reasons for this: better performance, client-side caching, etc.. I'm not going into those details here. Neither will I address the initial authentication step other than to say that part of a successful login is the negotiation of a shared secret other than the user's password. Ideally this is a 64-byte (or larger) id with a high probability of uniqueness. A GUID, essentially. (It is critical that the secret used is **NOT** the user's password!) The actual secured login and secret negotiation will be addressed in another entry . At least, that's the plan :).

Since our primary goal with this system is to ensure that people are who they say they are, and we've punted on the initial authentication (for now), the only place left for an attack is for some one to spoof a user who has already logged in. With no additional work, our login system would be useless -- some one could simply skip the entire authentication process and issue an RPC with instructions to do evil things as Alice's user without needing to know Alice's password. To prevent this, we need to ensure that the same user who authenticated initially is the user who issued the unsecured RPC. This is where the shared secret comes into play. Only Alice and the server know her shared secret, so if the secret is passed along as a parameter of the RPC, then that is a strong indication that Alice is who she says she is.

But wait! We can't just pass the secret as an RPC parameter, because these communications aren't secure. Charlie could lurk on the 'net, waiting for an RPC from Alice to the server, sniff out the secret, and then proceed to impersonate Alice. We could encrypt the secret, but then we just have a different secret -- Charlie doesn't need to know the unencrypted secret if the encrypted one works just as well. Alice and the server also need an agreed upon way to change the secret so it is different for each RPC, and this must be done in a way that Charlie can't take the changed secret and either (1) get the initial secret out, or (2) generate the next changed secret.

## HMAC: *keyed-Hash Message Authentication Code*

HMAC is a method of ensuring that a message (an RPC in our case) was generated by someone with access to a shared secret. HMAC makes use of some sort of one-way hashing function (like MD5 or SHA-1) to encrypt the secret along with a message. This generates a short digest of 16-20 bytes that acts as a fingerprint of the message+secret combination. When the digest is sent along with the message, the receiver (our server) can re-generate the hash with the same HMAC calculation and compare the locally generated digest with the digest that came along with the message. Remember: the server has the secret too, so it has enough information to confirm the digest.

So, back to our problem. Alice can now use the shared secret to create a digest of every RPC, and send that along with the RPC as a parameter. The server can then take the digest of the RPC and secret to compare, and then verify that the RPC actually originated with Alice, right?

Not *quite* yet.... there are still a couple holes in our plan.

Charlie could still sit back and snoop on Alice's traffic and save an **entire** RPC, complete with digest, and reissue that RPC later. This is better than letting Charlie do whatever he wants, but there are still some things that could be quite dangerous. Say Alice accidentally deletes something, and undoes the deletion. Charlie could re-issue the deletion and Alice would loose data. The server needs to know not to accept the same request twice (but what if Alice *wants* to do something twice, you ask? Well, we have to make Alice's second request a *little* bit different from the first one, which we can do!).

What if we create a digest of some sequence identifier and pass the sequence ID along with the RPC? Since the digest, ID, and RPC are inseparable (the digest and ID are obviously linked, and the RPC can't be sent without a valid digest, which Charlie can't reproduce, since the accompanied digest+id pair is only good once) then we don't need to create a digest of the entire RPC (it wouldn't hurt, it's just a fairly complex thing to do). By incrementing the sequence id and recalculating a digest of it and the secret, then we can keep from issuing the same request more than once, and the server will know to ignore duplicates.

So, this is where we're at:

* Charlie can't snoop the secret, since it's encrypted with a changing message (the sequence id)
* Charlie can't re-issue a "recorded" RPC invocation, because the digest can't be reversed and Charlie can't create a valid (digest, sequence) pair without the secret.
* Charlie can't change a RPC, again because of the trouble with creating a digest, sequence pair.

Charlie's only recourse is to try and find a secret which generates the same digests as the secret that Alice is using. This is theoretically possible, since Charlie could probably figure out the hashing algorithm used, and run a brute-force attack, hoping to luck out and find the secret quickly. The possibility of this happening is extremely low, however. Furthermore, each session will use a new secret, so Charlie will only have one session's worth of time to crack each secret. Even creating a rainbow table will fail if the secrets aren't of trivial length. (A 64-**bit** secret will be to large, and we're using secrets 8 times that size.)


## Technical details and further reading

When implementing an approach like this, make sure to guard the secret. It would be easy to accidentally store the secret on the web client as a plain cookie which will then be transmitted in the clear with each RPC, and therefore defeat the purpose. Use a secure cookie, or some other storage method to prevent this.

The [HMAC RFC](http://www.ietf.org/rfc/rfc2104.txt) describes the algorithm in detail (and it's a fairly short, easy to read RFC.) and the Wikipedia page gives a nice description too: [HMAC on WikiPedia](http://en.wikipedia.org/wiki/HMAC#Example_usage)

Friday, August 31, 2007

Things I need

There are many small apps that I wish I had, here's a short list of the ones that come to mind at the moment:

## A process monitor that shows the top consumer.

I often tack my system(s) to the max, and therefore run out of cycles frequently. While this is sometimes the result of batch computations that I've planed in advance, it is pretty common that I'll just be working away and all of a sudden things shudder to a halt. When this happens I want to know two things:
1. Is it processor-related, or is it memory-related, and;
2. What application is responsible?
Processor / memory monitors are a dime a dozen, but they are typically very small (showing only the usage, like gkrellm) or very large (showing all the applications in the top 20 or so, like top). I can't stand having top visible all the time, and it takes to long to get to a terminal and start up a monitor. By definition my system is not very responsive, and I never see what's causing the slowdown.

I need a small process and/or memory monitor that shows the top-using application in a tooltip, or optionally in an automatic pop-up when the usage hits a certain level.

## Universal acronym definitions.

Highlight an acronym, hit a keystroke, and see a list of the most common expansions of that acronym based on frequency of use.

## A calendar dock-app where the date on the dock icon is actually accurate.

Yeah, I actually want to look at my system bar-thingy and see what day it is, not some random number between 1 and 31 that the icon developer thought represented calendars.

Hovering over the icon shows the full time/date, which is configurable from an entry on the icon's context menu. I don't care what happens when I click on the icon, as long as I can make it do something arbitrary ;).

## The ability to refactor and generate source code from the command line.

MetaJava (http://wiki.ciscavate.org/index.php/MetaJava) could resolve this issue for one language, but I'd hate to stop there. A tool like this would mean amazing things for small-time development environments and text-editor lovers. (Emacs and vi would easily *eclipse* some other IDEs IMHO ;).

The idea is that you could easily create mini applications that read in specifications in some simple format and produce boilerplate for your required language, and/or move classes, rename variables / methods / packages / etc. without dedicating half your memory to an IDE that will then want to write 500 mb to swap, since all my memory is taken up by an application I have to run because....

## I don't have a web browser that doesn't suck.

"We hold these truths to be self-evident."

### ...and a web development environment to go with it.

That's a start. I'll add more as they occur to me.

Friday, May 18, 2007

(not (fill-paragraph))

I use emacs as much as possible, today being no exception. Currently,
I'm doing a fair bit of writing at work, and unfortunately that means
Word (or Open Office at best, depending on what OS I'm in). Neither
program supports much in the way of emacs compatibility modes, so if
I'm generating new content (as opposed to editing an existing doc.) I
tend to write in Emacs, and paste into Word when I'm finished. This
works pretty well, considering.

There is one very annoying issue though: when in emacs, I use
`auto-fill-mode` to keep the content on screen as I type. The problem
is that `auto-fill-mode` breaks lines with literal newline characters,
while the word wrapping in Word / OpenOffice just wraps the content
without additional characters. As a result each line ends up as its
own paragraph when I paste content from emacs into Word. The solution,
of course, is to extend emacs with a simple function to undo
`auto-fill`.

## Merging lines

The first problem, as I saw it, was to find a function that would
merge two adjacent lines, and leave them separated by a single space.
Unfortunately, such a thing doesn't seem to exist. No problem, we
need to go to the end of the current line (`end-of-line`), search
backwards for the first non-whitespace character (`[^ \t]`), erase
the rest of the line, including the end line (`kill-line`), and insert
a space (`(insert " ")`).


(defun mergelines(&optional backward)
"Merges the following line with this line, or merges this line
with the previous line if a prefix argument is provided.
Removes any whitespace between lines, replacing it with a
single space."
(interactive "P")
(if backward
(previous-line))
(end-of-line)
(re-search-backward "[^ \t]")
(forward-char)
(kill-line)
(insert " "))


To make it more useful, I added a parameter that determines if the
line below, or line above should be merged. This made the rest of the
`unfill` function much easier to write.

## Un-filling a region

Now that we can merge lines, lets address the problem of unfilling a
*bunch* of lines. Since I know there is a function `mark-paragraph`
already, lets just deal with arbitrary regions for now.


(defun unfill-region(rstart rend)
(interactive "r")
;; get to the end of the region:
(goto-char rend)
;; if the region ends on the first char. of a line, move up a line.
;; this makes it easier to select a paragraph and apply the function.
(if (= (point) (line-beginning-position))
(previous-line))
;; loop while the point isn't on the starting line:
(while (not (= (line-number-at-pos (point))
(line-number-at-pos rstart)))
;; merge with previous line.
(mergelines t)))


I've tried to comment well, so it should be relatively straight
forward, but here's an overview of the algorithm anyway:

1. `(interactive "r")` just means that the current region's start and
end locations are stored in the parameters `rstart` and `rend`.
2. We need to merge from the bottom up, because if we merge from the
top down we need to keep track of the lines merged, and things
generally become more complex (we might end up merging to many lines
if we loose count.). Because of this, we first move to the end of the
region.
3. Since you (well, I) generally select from the first column, and
move one line past the last line I need (try it if you don't
understand what I mean), I needed a special case to keep from merging
the empty line between paragraphs.
4. Now, we merge each line with the line above, which moves the point
up a line too. When the point is on the same line as the start of the
region, we stop.

I haven't merged it with `mark-paragraph` yet, but it would be trivial
to do so. More importantly, I want to make it skip blank lines, so it
will be possible to mark an entire document, and call `unfill-region`
(and therefore, write `unfill-buffer`). As it is now, if you do that
the entire document ends up on one line, which is not usually ideal :)

Wednesday, March 21, 2007

The Matrix is under construction

### <blink>12:00</blink>

_Artificial Intelligence_ is a term with a great deal of accumulated
baggage. Throughout the years sci-fi authors and screenwriters have
depicted AI as a marvelous double-edged sword. On one hand, the
benefits of 'AI' are myriad--free, inexhaustible and ethical sources
of labour could greatly increase our productivity, even to a point
beyond that of reason, allowing everyone to live relaxed lives of
artistic purity. To top it all off, such a society of musicians and
artists could generate their own entertainment, thus bankrupting the
RIAA and MPAA. (Really, do your Utopian dreams top that?)

Then, as the story progresses, the AI start to become devious. Humans
become restless in their artistic pursuits while the machines evolve
_ghosts_ that resent their inventors. At the last moment, just before
the annihilation of all humanity, or the eternal slavery of the race,
Keanu Reeves (or Will Smith) shows up to do battle in Wachowski-style
slow-mo while using the inconsistencies of English to lock the evil
network of machines into a final state of illogic and
self-destruction. This, of course, destroys all instances of the
rouge process, and life returns to the state it was in before
automation created such a false utopia. (...and presumably we're back
at square 1, listening to overpriced music from underpaid artists.)

The result of all these (highly entertaining, I must say)
sensationalistic portrayals of automation gone awry is that we're all
somewhat afraid of being slaves to robots.

And none of you will admit it. (I work in the field, so I can't be
afraid ... can I?)

### Honey, have you seen the roomba?

Ok, I admit it, I've had the occasional dream about rabid computers
charging around and directing people to do whatever robots want people
to do. Usually I'm about to meet my untimely demise right when the
central AI segfaults because it's "attack" routine takes a `double`
and I happened to be 1/3 of a distance unit away, causing a rounding
error that escalates and ends as a divide-by-zero, crashing the entire
system.

The dreams can be scary for a while, but I can't convince my self that
I'll ever be chased by a truly well designed and tested robot. Let
alone one that's self-aware.

That's actually only part of the reason I'm not worried about
an AI-controlled utopia ever occurring. The rest of the reason
actually isn't germane to this essay, believe it or not!

### Fine. Forget it, I'll do it in Word.

I'm going to start this off with a quick tangential story about a
friend of mine.

> This friend works for a company that has a wiki hosted on some
> external site that is maintained by the hosting company (call the company
> Hoster). Hoster is serious about security. In fact they're using
> some sort of automated attack-detection service which can determine
> when someone is trying to crack their servers or perform some other
> devious deed.
>
> When Hoster's system detects an "attack" it blacklists the attacker's
> IP block, and the attacker can no longer get near the server.
> Everything would be fine and dandy, but in this system's eyes, my
> friend and his coworkers often stage "attacks" against their own wiki.
> Therefore they have to contact Hoster every week or so, and ask that
> the ban be lifted. The last time this happened, my friend asked
> Hoster to put the company IP Block on a whitelist, granting them Carte
> blanche without being banned.
>
> The response?
>
> Hoster: "We can't."
> Friend: "But this happens all the time."
> Hoster: "yeah, we can't."
> Friend: "But this happens **ALL** the time."
> Hoster: "sorry, it's a good idea and all, we just can't put you on a
> white list."

I have some theories about why Hoster can't exclude their customers
from their own security tools. Hoster most certainly didn't develop
the blacklisting tool in-house, and the phone tech would have no
access to the internal configuration at all. Odds are, Hoster has a
simple web interface to do wiki management, and one of the pages in
that UI shows the list of blacklisted IPs, if that. The phone tech
can then go in and search for a given computer and remove it from the
blacklist. Hoster probably can't modify the whitelist at all through
the web ui, it's just not a feature.

So, why isn't it a feature? Let's peel back another layer and look at
the company/dev team that produced the blacklisting tool. Odds are
the tool is using an off-the-shelf classifier, which aren't renowned
for being easy to understand without a lot of examination. Perhaps
the classifier is actually an embedded part of the firewall system.
The blacklist could be a nothing more than a list of routing rules to
deny traffic from the "bad" addresses. Removing an IP would be
trivial--delete the rule, but whitelisting would be virtually
impossible if the firewall was too tightly coupled with the
classifier.

Have you ever run across other applications that exhibit similar
behavior? The IBM OmniFind enterprise search app throws internal
server errors when you query for ["international
suspect"](http://omnifind.ibm.yahoo.net/forums/index.php/topic,668.0.html)
with the default settings and some document collections. How does
this happen? (IBM is hard at work on that problem, by the way.)
Using open source tools opened my eyes to many absurd things I do to
placate my tools, mostly because I forgot all the tricks I needed to
use Windos 98 without making it crash (Click here, wait, use the File
menu to close the app, but not if it's maximized.. that sort of
thing.) There are studies of this sort of thing--the cognitive
dimensions and attention investment both address user confusion and
effort when using an application. There is even a group at Microsoft
dedicated to improving APIs based on the cognitive dimensions (I
really hope they just haven't gotten around to .NET 2.0 yet).

How much is poor design / implementation impacting the way we use our
computers? Hosters could loose customers because they can't add
people to a whitelist, which could very conceivably be due to software
design. In some small way, they are already being controlled by their
servers, and Will Smith is busy talking to
[fish](http://en.wikipedia.org/wiki/Shark_Tale).

Anyhow, that's my rant. I'm afraid that we're painting ourselves into
a corner by building larger and larger applications that all impose
their own restrictions on how we can use and extend our tools. If we
don't get over that, we'll never be running in fear from sentient
vacuum cleaners and robotic dogs. (I should point out that I don't think
the solution is to stop building large systems, rather we should focus
on [maintainability, extensibility, QWAN,
etc..](http://steve-yegge.blogspot.com/2007/01/pinocchio-problem.html)).

Tuesday, March 20, 2007

mt.el: posting from emacs

MT + Emacs + Markdown & Geshi?
------------------------------

Is it possible? We're here to find out :) I just got around to
installing `ml.el` in emacs, and this post is essentially a test to
see if markdown syntax will work (and round-trip to Movable Type and
back to emacs -- it seems to come *from* mt correctly...).


### Source code:

transcode-language: java
public class TestClass{

/**
* test
*/
public static void main(String[] args){
// ...
}
}

### Well, not quite.

Everything seems to work, aside from the <pre ...> tags I use for code
formatting with geshi. I'll have to look into a way of incorporating
that with some existing markdown formatting trick.

Ah-ha! The MT Geshi plugin I'm using
([transcode](http://periodic-kingdom.org/ben/)) expects code blocks to
be in the following format:

<pre><code>transcode-language: language
....
</code><pre>

Markdown turns all consistantly indented regions into
<pre><code>..</code></pre> blocks, so all you have to do is to start
each code block with the (somewhat ugly) transcode-language: lang
line. It's taken out by transcode, so the source will show up w/out
it. Next task: Add an emacs filter to turn <code lang="*lang*">... into
the above mentioned indentation/transcode syntax.