Sunday, November 14, 2010

Lexical Analysis of C using Python and Ply

Code reviewers fall into two camps; those who rely on grep and their favorite text editor for review, and those who rely on a sophisticated language-specific review environment or IDE with a cross-reference generator. Consultants tend to be in the former camp, as getting a customer's random code base into an IDE can be almost as miserable as getting it out.

I use a hybrid strategy, involving a simple webapp that does syntax highlighting and grep with a few simple features that lets me combine common browsing habits (history, document tabs and linking) with a minimal expectations environment. It isn't beautiful, or featureful, but it doesn't interrupt my flow.

Of course, there's always room for improvement, like a cross-reference of identifiers, and the source files that mention them. This requires simple lexical analysis which is where a smart C programmer goes to Flex. So, where does a Python programmer go? My best guess is Ply -- a Python Lexical Analyzer that merges Lex semantics with Python metaprogramming.

So, in WEPMA fashion, here is the interesting bit, a lexical analyzer that produces identifiers, line numbers, and tokens indicating the start and end of lexical scopes. It is barely smart enough to filter out comments and strings, and tolerant of unanticipated syntactic elements because, obviously, I couldn't be bothered to implement a full C lexer.

Enjoy, and no, you can't have my review tool. :)

5 comments:

  1. Heads up, I'm re-using your code.

    I need a partial C codewalker for an unusual purpose. Result will be open source.

    ReplyDelete

  2. Thanks for providing good information,Thanks for your sharing python Online Course

    ReplyDelete