Jallie User Manual


Introduction

JALLIE is a tool that can be used to disassemble, explore, update, and write to the Java class file format. It has a few batch-mode options for disassembling or running scripts over class files, but its most powerful usage is running in interactive mode with the class file loaded into memory. In this mode, you have a python command prompt and full access to all of python, and the class is parsed and loaded into specialized data structures that make exploring, dumping, or changing the classfile data safe and easy. In this mode you're really running a glorified binary editor which can take care of much of the class file bookkeeping for you.

As this program deals with Java classfiles at a rather low level, it is useful to refer to the Java Virtual Machine Specification when familiarizing yourself with the program. Especially useful is Chapter 4, which defines the data formats and layout of the data structures in a classfile. Each structure defined in this chapter is present in jallie, with the same name. When browsing though a classfile with jallie, the intent is that you'll see the same exact structures with the attached data for the instance of the classfile you're looking at.

This manual will demonstrate concepts using examples. The olive-colored boxes will show the exact commands types and the response that jallie provides. If you're familiar with python, you may well notice that the command interface is simply a python script interpreter and you may perform any python commands you wish. If you're not familiar with python, don't worry -- just about everything you will want to do uses a very limited and easy-to-use and straightforward subset of python syntax. You can easily use this program without knowing any python by following the examples in this document.

Just about all of the examples assume a simple "Hello world" classfile in the current directory. Here's the Java source:

public class Hello {
    public static void main(String argv[]) {
        System.out.println("Hello, world!");
    }
}
    

Just compile it, and then you can follow along:

user@home> javac Hello.java
    


Getting Started

As of now, there's no fancy installation or unpacking needed for using the program. You can simply download the latest tar/zip file, unzip it with your favorite unzip program, change the file permissions to execute, and then you can run it. It does require the python program to be installed and available on your path.

Here's how it's done on just about any unix-based platform (or cygwin, on windows):

user@home> gunzip -dc jallie-XXX.tar.gz | tar xf -
user@home> chmod +x jallie
user@home> ./jallie Hello.class
... 
    

Disassembling Class Files

The most straightforward way to view your class files is to simply disassemble them. You can do this in jallie in batch mode in which case it will disassemble all files listed on the command-line and print them out in excruciating detail (right down to the fields of the bytecodes). Files can be listed as relative or absolute paths, or you can provide the dotted-package name format (like you would for java or javah). When disassembly has completed the program will exit.

user@home> ./jallie --dis Hello.class 2>/dev/null
ClassFile  {
  u4 magic = 0xcafebabe
  u2 minor_version = 0
  u2 major_version = 50
  u2 constant_pool_count = 29
  cp_info constant_pool[29] {
    [ 0 ] [Unused]
    [ 1 ] CONSTANT_Methodref_info  {
      u1 tag = 10
      u2 class_index = #6 // "java/lang/Object"
      u2 name_and_type_index = #15 // <init> ()V
    }
    [ 2 ] CONSTANT_Fieldref_info  {
      u1 tag = 9
      u2 class_index = #16 // "java/lang/System"
      u2 name_and_type_index = #17 // out Ljava/io/PrintStream;
    }                                                                           
[...]
user@home>
    

The full output was clipped for brievity, but is available here if you want to browse through to get an idea of what is displayed. The 2>/dev/null part of the command is just to hide the python clutter that gets sent to stderr.

Things to notice here:

  • Each structure is displayed using its name (ClassFile, CONSTANT_Methodref_info), and a list of its fields with the field values displayed inline. Primitive fields (u4, u2) have their values displayed directly. When there's additional meaning associated with a primitive field value, the value is formatted, annotated, or sometimes replaced by the meaning rather than just the raw byte values.
  • Array values are displayed in typical array notation: the base type of the array, the name of the field, and then brackets which contain the number of elements. Each element in the array is prefixed with its index in brackets, followed by the contents of the element.
  • In some places in the classfile, an array can only hold one type of data. In most other places, though, the array is defined with a base type, but actual contains instances of subclasses of that base type. The base type is used in the field specification of the array, but the actual subclasses are used as elements when appropriate.
  • The hash (#) notation is used to indicate constant pool index values.


Exploring Class Files

The common use of the program is to explore class file interactively by starting with an overview and drilling down to the points of interest. This is the default mode of the application. You can either preload a number of classes by providing the class file locations (or class names), or you can load files from inside the environment. Classes that are automatically loaded will be assigned to a python variable which matches the class name. A variable loaded also exists which is a list that contains all the auto-loaded classfiles. Classfiles which are loaded interactively are not assigned variables nor do they get added to the loaded variable (but you can do these things if you want to).

Automatically-loaded classes which are in the default package will be assigned to simple name variables which match the class name. Classes which are in packages will be accessible through their full dotted-package name.

user@home> ./jallie Hello.class com/foo/Main.class
Loaded class: "Hello"
Loaded class: "com/foo/Main"
>>> print loaded
[ ClassFile(Hello), ClassFile(com/foo/Main) ]
>>> print com.foo.Main
ClassFile  {
  u4 magic = 0xcafebabe
  u2 minor_version = 0
  u2 major_version = 50
  u2 constant_pool_count = 29
  cp_info constant_pool[29] = { ... }
  u2 access_flags = ( ACC_PUBLIC, ACC_SUPER )
  u2 this_class = #5 // "com/foo/Main"
  u2 super_class = #6 // "java/lang/Object"
  u2 interfaces_count = 0
  u2 interfaces[0] = { ... }
  u2 fields_count = 0
  field_info fields[0] = { ... }
  u2 methods_count = 2
  method_info methods[2] = { ... }
  u2 attributes_count = 1
  attribute_info attributes[1] = { ... }
}
    

Classes can also be loaded from a specific file on-the-fly, or empty classes can be created:

>>> cf = ClassFile('Goodbye.class')
>>> print cf.access_flags
33
>>> empty = ClassFile()
>>> print empty
ClassFile  {
  u4 magic = 0xcafebabe
  u2 minor_version = 0
  u2 major_version = 0
  u2 constant_pool_count = 1
  cp_info constant_pool[1] = { ... }
  u2 access_flags = (  )
  u2 this_class = #0 // NULL
  u2 super_class = #0 // NULL
  u2 interfaces_count = 0
  u2 interfaces[0] = { ... }
  u2 fields_count = 0
  field_info fields[0] = { ... }
  u2 methods_count = 0
  method_info methods[0] = { ... }
  u2 attributes_count = 0
  attribute_info attributes[0] = { ... }
}
>>>

    

By default, printing any structure or array will display fields and primitive values, but will not recursively print substructures (see Settings for details on how to change this). Many primitive values are translated or annotated to help display their meaning. However, they are still primitives which can be displayed on their own and even mixed into expressions.

>>> print Hello.major_version
50
>>> print Hello.access_flags
33
>>> print Hello.magic
3405691582
>>> print Hello.minor_version + 4
4
>>>
    

The values displayed as { ... } indicate the field contains more data but that data is ommitted from the current display. One can print the specific field to see the details for that field. In many cases, a summary can be printed instead of the elipsis. Even when a summary is displayed, printing the field individually will display the more detailed breakdown of data for that field. Array elements can be accessed using the typical array syntax. Python actually provides array slicing capabilities, and those can be used as well to display only part of an array.

Drilling down for details:

>>> print Hello
ClassFile  {
  u4 magic = 0xcafebabe
  u2 minor_version = 0
  u2 major_version = 50
  u2 constant_pool_count = 29
  cp_info constant_pool[29] = { ... }
  u2 access_flags = ( ACC_PUBLIC, ACC_SUPER )
  u2 this_class = #5 // "Hello"
  u2 super_class = #6 // "java/lang/Object"
  u2 interfaces_count = 0
  u2 interfaces[0] = { ... }
  u2 fields_count = 0
  field_info fields[0] = { ... }
  u2 methods_count = 2
  method_info methods[2] = { ... }
  u2 attributes_count = 1
  attribute_info attributes[1] = { ... }
}
>>> print Hello.methods
method_info[2] {
  [ 0 ] method_info( "<init>", "()V" )
  [ 1 ] method_info( "main", "([Ljava/lang/String;)V" )
}
>>> print Hello.methods[1]
method_info  {
  u2 access_flags = ( ACC_PUBLIC, ACC_STATIC )
  u2 name_index = #11 // "main"
  u2 descriptor_index = #12 // "([Ljava/lang/String;)V"
  u2 attributes_count = 1
  attribute_info attributes[1] = { ... }
}
>>> print Hello.methods[1].attributes
attribute_info[1] {
  [ 0 ] Code_attribute: { ... }
}
>>> print Hello.methods[1].attributes[0]
Code_attribute  {
  u2 attribute_name_index = #9 // "Code"
  u4 attribute_length = 37
  u2 max_stack = 2
  u2 max_locals = 1
  u4 code_length = 9
  bytecode code[4] = { ... }
  u2 exception_table_length = 0
  exception_table_entry exception_table[0] = { ... }
  u2 attributes_count = 1
  attribute_info attributes[1] = { ... }
}
>>> print Hello.methods[1].attributes[0].code
bytecode[4] {
  [  0 ] L0:   getstatic #2 // java/lang/System.out Ljava/io/PrintStream;
  [  1 ] L3:   ldc #3 // "Hello, world!"
  [  2 ] L5:   invokevirtual #4 // java/io/PrintStream.println (Ljava/lang/String;)V
  [  3 ] L8:   vreturn
}
>>>
    

Any internal structure is addressible and can be assigned to a local variable for later referencing. This can save some typing when exploring deep in the data structure. All assignments are reference assignments, not copies. This is not important now, but is something to keep in mind if you start changing fields. (Local variables need no declaration -- their assignment creates them).

>>> code = Hello.methods[1].attributes[0].code
>>> print code
bytecode[4] {
  [  0 ] L0:   getstatic #2 // java/lang/System.out Ljava/io/PrintStream;
  [  1 ] L3:   ldc #3 // "Hello, world!"
  [  2 ] L5:   invokevirtual #4 // java/io/PrintStream.println (Ljava/lang/String;)V
  [  3 ] L8:   vreturn
}
>>> print code[2]
invokevirtual  {
  u1 opcode = 0xb6
  u2 index = #4 // java/io/PrintStream.println (Ljava/lang/String;)V
}
    

There are a number of things one can do with any substructure reference in the classfile. The classfile() method returns a reference to the classfile which the reference is a part of. The size() method can be used to find the on-disk size in bytes of the reference and everything it contains. And the hex() method shows the actual bytes in hex (in a format understood by the program xxd)

>>> print Hello.major_version.size()
2
>>> Hello.major_version.hex()
0000000: 0032
>>> print Hello.methods[1].size()
51
>>> Hello.methods[1].hex()
0000000: 0009 000b 000c 0001 0009 0000 0025 0002
0000010: 0001 0000 0009 b200 0212 03b6 0004 b100
0000020: 0000 0100 0a00 0000 0a00 0200 0000 0300
0000030: 0800 04
>>> Hello.methods[1].attributes[0].hex()
0000000: 0009 0000 0025 0002 0001 0000 0009 b200
0000010: 0212 03b6 0004 b100 0000 0100 0a00 0000
0000020: 0a00 0200 0000 0300 0800 04
>>>
    

Of course, calling hex() upon the classfile will get you the entire binary file.

>>> Hello.hex()
0000000: cafe babe 0000 0032 001d 0a00 0600 0f09
0000010: 0010 0011 0800 120a 0013 0014 0700 1507
0000020: 0016 0100 063c 696e 6974 3e01 0003 2829
0000030: 5601 0004 436f 6465 0100 0f4c 696e 654e
0000040: 756d 6265 7254 6162 6c65 0100 046d 6169
0000050: 6e01 0016 285b 4c6a 6176 612f 6c61 6e67
0000060: 2f53 7472 696e 673b 2956 0100 0a53 6f75
0000070: 7263 6546 696c 6501 000a 4865 6c6c 6f2e
0000080: 6a61 7661 0c00 0700 0807 0017 0c00 1800
0000090: 1901 000d 4865 6c6c 6f2c 2077 6f72 6c64
00000a0: 2107 001a 0c00 1b00 1c01 0005 4865 6c6c
00000b0: 6f01 0010 6a61 7661 2f6c 616e 672f 4f62
00000c0: 6a65 6374 0100 106a 6176 612f 6c61 6e67
00000d0: 2f53 7973 7465 6d01 0003 6f75 7401 0015
00000e0: 4c6a 6176 612f 696f 2f50 7269 6e74 5374
00000f0: 7265 616d 3b01 0013 6a61 7661 2f69 6f2f
0000100: 5072 696e 7453 7472 6561 6d01 0007 7072
0000110: 696e 746c 6e01 0015 284c 6a61 7661 2f6c
0000120: 616e 672f 5374 7269 6e67 3b29 5600 2100
0000130: 0500 0600 0000 0000 0200 0100 0700 0800
0000140: 0100 0900 0000 1d00 0100 0100 0000 052a
0000150: b700 01b1 0000 0001 000a 0000 0006 0001
0000160: 0000 0001 0009 000b 000c 0001 0009 0000
0000170: 0025 0002 0001 0000 0009 b200 0212 03b6
0000180: 0004 b100 0000 0100 0a00 0000 0a00 0200
0000190: 0000 0300 0800 0400 0100 0d00 0000 0200
00001a0: 0e
>>>
    

Every reference also has a path() method which details the path from the classfile to the reference.

>>> print Hello.constant_pool[16].name_index.path()
Hello.constant_pool[16].name_index
>>> print code.path()
Hello.methods[1].attributes[0].code
    

Modifying the Class Files

Changing a primitive values is usually as simple as typing in the assignment statement.

>>> print Hello.major_version
50
>>> Hello.major_version = 49
>>> Hello.minor_version = 3
>>> print Hello
ClassFile  {
  u4 magic = 0xcafebabe
  u2 minor_version = 3
  u2 major_version = 49
  u2 constant_pool_count = 29
  cp_info constant_pool[29] = { ... }
  u2 access_flags = ( ACC_PUBLIC, ACC_SUPER )
  u2 this_class = #5 // "Hello"
  u2 super_class = #6 // "java/lang/Object"
  u2 interfaces_count = 0
  u2 interfaces[0] = { ... }
  u2 fields_count = 0
  field_info fields[0] = { ... }
  u2 methods_count = 2
  method_info methods[2] = { ... }
  u2 attributes_count = 1
  attribute_info attributes[1] = { ... }
}
>>>
    

For access flags, the ACC_<flag> variables are defined and can be used to set the access flag bitfields. Use the or, and, or not bit operations to set the appropriate fields. There are similar type definitions for the T_<type> constants used in the newarray instruction.

>>> print ACC_PROTECTED
4
>>> Hello.access_flags = ACC_SUPER | ACC_PROTECTED
>>> print Hello.access_flags
36
>>> print Hello
ClassFile  {
  u4 magic = 0xcafebabe
  u2 minor_version = 3
  u2 major_version = 49
  u2 constant_pool_count = 29
  cp_info constant_pool[29] = { ... }
  u2 access_flags = ( ACC_PROTECTED, ACC_SUPER )
  u2 this_class = #5 // "Hello"
  u2 super_class = #6 // "java/lang/Object"
  u2 interfaces_count = 0
  u2 interfaces[0] = { ... }
  u2 fields_count = 0
  field_info fields[0] = { ... }
  u2 methods_count = 2
  method_info methods[2] = { ... }
  u2 attributes_count = 1
  attribute_info attributes[1] = { ... }
}
>>>
    

Some fields it does not make sense to change, such as the Java signature (the magic field). Also, there are many instances in the classfile where a length field deliniates the length of the following array. The is crucial in reading in the array, but once loaded the arrays are represented as python lists which have an implicit length. For this reason, the data structure mechanisms simply use the array length to update the length field dynamically so there's no need to change the field yourself. Change the array instead and the field will be automatically updated. This is enforced if you try to write to an automatically-set length field (this goes for attribute lengths as well).

>>> Hello.magic = 0x8badbeef
Error performing assignment: Field is read-only
>>> Hello.constant_pool_count = 10
Error performing assignment: Field is a length field (change the array instead)
>>> Hello.interfaces_count = 4
Error performing assignment: Field is a length field (change the array instead)
>>> Hello.interfaces += [ 4, 5, 6, 7 ]
>>> print Hello.interfaces_count
4
    

Similarly, some fields are discriminator values which determine the type of the structure in arrays which have variable-typed contents. The tag field in cp_info, for instance is an instance of this. Since the actual data associated with the structure is the type, the tag value is redundant after the data has been parsed. It is displayed for completeness, but is read-only. If you want a different discriminator value, replace the entire array entry with the new type, rather than trying to change the value.

>>> print Hello.constant_pool[9]
CONSTANT_Utf8_info  {
  u1 tag = 1
  u2 length = 4
  u1 bytes[] = "Code"
}
>>> Hello.constant_pool[9].tag = 7
Error performing assignment: Can't change cp_info tag
>>> Hello.constant_pool[9] = CONSTANT_Class_info()
>>> print Hello.constant_pool[9]
CONSTANT_Class_info  {
  u1 tag = 7
  u2 name_index = #0 // NULL
}
    

When making changes to bytecode which change the size, jallie attempts to adjust the affected code offsets in the classfile, such as the inter-code bci offsets in branch instructions, the exception handler table, and the LocalVariableTable, LocalVariableTypeTable, LineNumberTable, and StackMapTable. It would be wise to second-check these values after modifying bytecode to ensure they were adjusted correctly.

Saving Classes

Once you've made modifications to the classfile, you can write the modified bytes back to a classfile for later use using the write() method. You can write to the default file name, or specify a specific file to write to. If no extension is given, a '.class' extension is assumed.

>>> Hello.write()
Wrote 481 bytes to file 'Hello.class'
>>> Hello.major_version = 49
>>> Hello.minor_version = 3
>>> Hello.write('Hello_v49')
Wrote 481 bytes to file 'Hello_v49.class'
>>> Hello.write('bytes.bin')
Wrote 481 bytes to file 'bytes.bin'
    

Settings

There are a few user-controlable settings that can be used to change the default behavior. These settings are contained in the Settings class and can be assigned to at any time.

ValueDefaultDescription
extraDetailLevels0The number of additional levels of detail to display
indent'  'What is used as the indent string
verboseTrueVerbosity-level. Set to False to reduce system messages

For example:

>>> Settings.extraDetailLevels=2
>>> Settings.indent='"""'
>>> print Hello.methods[1]
method_info  {
"""u2 access_flags = ( ACC_PUBLIC, ACC_STATIC )
"""u2 name_index = #11 // "main"
"""u2 descriptor_index = #12 // "([Ljava/lang/String;)V"
"""u2 attributes_count = 1
"""attribute_info attributes[1] {
""""""[ 0 ] attribute_info  {
"""""""""u2 attribute_name_index = #9 // "Code"
"""""""""u4 attribute_length = 37
"""""""""u2 max_stack = 2
"""""""""u2 max_locals = 1
"""""""""u4 code_length = 9
"""""""""bytecode code[4] = { ... }
"""""""""u2 exception_table_length = 0
"""""""""exception_table_entry exception_table[0] = { ... }
"""""""""u2 attributes_count = 1
"""""""""attribute_info attributes[1] = { ... }
""""""}
"""}
}
    

Scripting

Jallie can also be used to bulk-process classfiles. This can be used for summarizing, searching, or algorithmically modifying classfiles. To use the processing capability, create a module with a global method named for_ClassFile which takes two arguments, a ClassFile object, and an optional pass-in command-line parameter. When the module name is passed to the --processor flag, instead of entering interactive mode, jallie will parse each classfile in it's command-list and call the for_ClassFile method with each classfile as an argument. If a --processor_arg value is specified on the command line, that is passed as the scond argument to for_ClassFile For example:

> cat version.py

def for_ClassFile(cf, arg):
  print '%s %s: %d.%d: num methods: %d' % \
    ( arg, cf.classname(), cf.major_version, cf.minor_version, len(cf.methods) )

> jallie -q --processor=version --processor_arg=asdf Hello LockExample 2>/dev/null
asdf Hello: 51.0: num methods: 2
asdf LockExample: 49.0: num methods: 4
    

Hosted by:
SourceForge.net Logo