Weapons of Mass Analysis

Wednesday, February 16, 2011

Closing the Loop -- Re-Engineering Android Applications

A lot of people have been asking for slides or more tutorial material for Android reverse engineering after my talks at BSides and Shmoocon. Problem is, neither of these cons have actually recorded the talks -- instead, I have put together a screencast demonstrating the workflow involved in re-engineering an application, adding a password logger and verifying its operation. This has two benefits -- first, it demonstrates some of these techniques without being redundant with my "Android Reverse Engineering Using the Emulator" and "Android Anatomy" talks, and it serves as a good demonstration for non-hackers, showing how easy it is to patch applications.

This technique is very common in the Android Market right now, with people modifying apps for good and bad reasons -- at some point, Google is going to have to do some level of verifing "good" applications and "responsible" developers, because the current market is packed with apps that demonstrate varying levels of naughtiness.

Monday, January 31, 2011

mp3collect.go -- reorganizing mp3 files by hashes of their mpeg-1 content

A friend asked me a couple weeks ago for a sample of what a "real" Go program looks like. I have been using Go quite a bit for fuzzers and analysis packages at IOActive for the last few months, but I obviously can't share those with anyone else. On the flight back from Shmoocon, I decided to write a Go program to solve a problem that has been slowly building up in my ~/music directory.

It's a real trainwreck; between cycles of using iTunes and copying my music between devices I now have this mass of duplicated songs that have tweaked ID3 tags so I cannot simply de-duplicate them using hashes. The solution is to calculate a hash for the actual MPEG frames frames in each file and ignore all the helpful metadata.

The program does just that by constructing hard links between a file and the hash of its media contents; duplicates are reported and left intact. The plan is to go back through those files and normalize their ID3 metadata using a program that doesn't try to "organize" my music -- Quod Libet. (Or the Android music player, which is too dumb to attempt any of this.) There is, of course, room for improvement -- it does not handle FLAC, OGG or M4A files, which do occur in my library due to certain stores using non-MP3 formats. (Trent Reznor, Rhythmbox and iTunes, respectively.) It also should have a way of properly handling cross-filesystem collections by copying the file instead of hardlinking.

mp3collect.go

Tuesday, December 21, 2010

Introducing Fuzzex, Generating Random Data From Regexes

Fuzzex produces sequences of random bytes using a generation language that is similar to that commonly used by regular expressions for parsing data. This similarity enables testers who are familiar with regular expressions to produce test data that can satisfy an application's superficial input validation and parsing without getting bogged down in specialized frameworks such as Sulley or Peach.

In situations where the regular expressions used for parsing and validation are available, Fuzzex enables using these expressions directly to develop tests that demonstrate potential weaknesses and exercise internal surfaces.

Example, a Very Permissive Email Address Regex:

>>> fuzzex.generate( '[^@]+@([^.]+)([.][^.]+)+' )
'\x07m\x10@\x0cI\x12%.\x1a.f.:'

Thursday, November 18, 2010

Spot the Crypto Bug

Had a fun crypto bug crop up in a discussion, today; the code in question, functions changed to protect the guilty:

   iv := read_cprng( 16 )
   enc := aes_enc( key )
   ciphertext := cbc_enc( iv, enc, iv + plaintext )

Where cbc_enc is a function that accepts an initialization vector, a block encryption function, and a buffer containing the plaintext to encrypt, and applies that function using the Cipher Block Chaining mode and the initialization vector.

Can you spot why, regardless of variance in the IV, given a constant plaintext and key, why the ciphertext never varies?

Sunday, November 14, 2010

Lexical Analysis of C using Python and Ply

Code reviewers fall into two camps; those who rely on grep and their favorite text editor for review, and those who rely on a sophisticated language-specific review environment or IDE with a cross-reference generator. Consultants tend to be in the former camp, as getting a customer's random code base into an IDE can be almost as miserable as getting it out.

I use a hybrid strategy, involving a simple webapp that does syntax highlighting and grep with a few simple features that lets me combine common browsing habits (history, document tabs and linking) with a minimal expectations environment. It isn't beautiful, or featureful, but it doesn't interrupt my flow.

Of course, there's always room for improvement, like a cross-reference of identifiers, and the source files that mention them. This requires simple lexical analysis which is where a smart C programmer goes to Flex. So, where does a Python programmer go? My best guess is Ply -- a Python Lexical Analyzer that merges Lex semantics with Python metaprogramming.

So, in WEPMA fashion, here is the interesting bit, a lexical analyzer that produces identifiers, line numbers, and tokens indicating the start and end of lexical scopes. It is barely smart enough to filter out comments and strings, and tolerant of unanticipated syntactic elements because, obviously, I couldn't be bothered to implement a full C lexer.

Enjoy, and no, you can't have my review tool. :)

Saturday, October 30, 2010

JavaScript, Closures, and Wasteful API's

First, read this function: later (YUI). I consider this a great example of how framework developers can overreach with abstractions, considering that JavaScript has lexical scope and closures that are fairly easily implemented. Now, read the implementation: YUI-Later.js

Yahoo has written roughly 30 lines to encapsulate and abstract the simple functionality of passing a thunk to either setInterval or setTimeout. An example, stripped from Todd Kloots' YUI 3 demo:

var args = [ 1,2 ]
Y.later( 50, gizmo, gizmo.foo, args )

Could be more simply expressed as:

setTimeout( 50, function( ){ gizmo.foo( 1, 2 ) } )

And, hey look, no CDN callout required. No need for a code reviewer to reach out for YUI's documentation to find out the special semantics of YUI, and it explains exactly what it means. And, bonus, fewer keystrokes.

Libraries like jQuery and YUI have valuable capabilities, such as concealing all of the W3C's pointless DOM verbosity behind more modern XPath-like selectors. But when these frameworks feel the need to abstract away closures, all I really see is a developer who has lost touch with the clear simplicity of JavaScript.. And start wondering if they get paid by the API function.

Monday, October 18, 2010

Long Polling with Node.JS and Express

When I write tools or algorithms that I want other people to improve or understand, I use Python. When I am writing them for myself, because I'm in a hurry, I use Lisp. (I think in closures and the application of functions, which I occasionally force myself to re-express in classes and methods.) Since Python's Lambda syntax is a great disappointment to me and its father, Guido van Rossum, I occasionally pine for the weird cousin of Lisp we call JavaScript.

JavaScript is regarded by Lisp hackers as Lisp without parenthesis, shackled by the problem domain of browser scripting. It's a great, powerful language for people who think in closures, but until the recent introduction of libraries like jQuery, it's also shackled to really cruddy libraries. When Google released V8 under the BSD license, I think many of us immediately ran to check out the source, write a partial general purpose environment, then wandered off to do better things. Like bugfixes for MOSREF. *cough*

Ryan Dahl, unlike the rest of us, stuck with it, and fused V8 with the similarly fascinating libev to produce a JavaScript environment for I/O-centric problems that don't live solely within the browser. The resulting Node.JS strikes an interesting balance between minimalism, functionality, and performance thanks to its reliance on existing projects with great characteristics.

When I encounter a new language or framework, I fall back on a set of problems dear to my heart -- writing a MUD server. With web frameworks, lately, this has been simplified down into "can I write a long-polling message wall with it?" Simple problem, tends to break most simplistic web frameworks simply because requests are often deferred, waiting for an update.

Here it is in Node.JS, using Express, about 50 lines of overcommented code. I'm sure it could be written faster, but probably not as concisely.

Next up, making a kobold walk around a message board.. ;)

Thursday, August 12, 2010

More Fun With Nessus Reports

A common grievance for security professionals dealing with Nessus reports is the organization of the report by host or IP address. This makes it difficult for organizing findings by type of vulnerability. This script is a little more complicated than "nsfix", but probably more useful. Enjoy.

nscross.py

(I reserve the right to be somewhat embarrassed if the Nessus experts come out of the woodwork with an option to do this, too, from the Nessus GUI..)

Wednesday, August 11, 2010

Nessus False Positives Getting Underfoot?

So.. After you've run the scan, you've found yet another false positive in Nessus due to the idiosyncracies of your environment. Here is a script to purge a particular plugin from a Nessus report so you don't have to redo the scan after fixing your scan parameters.

nsfix.py

This may work on OpenVAS reports, let me know if it causes a problem. As always, improvements are welcome.

Updated: pauldotcom from Twitter makes an excellent point that this can be achieved using the "Report Filters" interface. I blame my fear of flash guis for not finding this.

Monday, July 19, 2010

Cross-Platform Raw Character Input in Python

Handy trick for Python hackers who need to grab a keypress from the terminal but don't want to get bogged down in Curses.

getch.py