Jallie User Manual
Introduction
JALLIE is a tool that can be used to disassemble, explore, update, and
write to the Java class file format. It has a few batch-mode options
for disassembling or running scripts over class files, but its most
powerful usage is running in interactive mode with the class file loaded
into memory. In this mode, you have a python command prompt and full
access to all of python, and the class is parsed and loaded into
specialized data structures that make exploring, dumping, or changing
the classfile data safe and easy. In this mode you're really running
a glorified binary editor which can take care of much of the class file
bookkeeping for you.
As this program deals with Java classfiles at a rather low level, it is
useful to refer to the Java Virtual Machine Specification when
familiarizing yourself with the program. Especially useful is
Chapter 4, which defines the data formats and layout of the
data structures in a classfile. Each structure defined in this chapter
is present in jallie, with the same name. When browsing though
a classfile with jallie, the intent is that you'll see the
same exact structures with the attached data for the instance of the
classfile you're looking at.
This manual will demonstrate concepts using examples. The olive-colored
boxes will show the exact commands types and the response that
jallie provides. If you're familiar with python, you may well
notice that the command interface is simply a python script interpreter
and you may perform any python commands you wish. If you're not familiar
with python, don't worry -- just about everything you will want to do
uses a very limited and easy-to-use and straightforward subset of python
syntax. You can easily use this program without knowing any python by
following the examples in this document.
Just about all of the examples assume a simple "Hello world" classfile
in the current directory. Here's the Java source:
public class Hello {
public static void main(String argv[]) {
System.out.println("Hello, world!");
}
}
|
Just compile it, and then you can follow along:
user@home> javac Hello.java
|
Getting Started
As of now, there's no fancy installation or unpacking needed for using
the program. You can simply download the
latest tar/zip file, unzip
it with your favorite unzip program, change the file permissions to
execute, and then you can run it. It does require the
python program to be installed and available on your path.
Here's how it's done on just about any unix-based platform
(or cygwin, on windows):
user@home> gunzip -dc jallie-XXX.tar.gz | tar xf -
user@home> chmod +x jallie
user@home> ./jallie Hello.class
...
|
Disassembling Class Files
The most straightforward way to view your class files is to simply
disassemble them. You can do this in jallie in batch mode in
which case it will disassemble all files listed on the command-line
and print them out in excruciating detail (right down to the fields
of the bytecodes). Files can be listed as relative or absolute paths,
or you can provide the dotted-package name format (like you would for
java or javah). When disassembly has completed the
program will exit.
user@home> ./jallie --dis Hello.class 2>/dev/null
ClassFile {
u4 magic = 0xcafebabe
u2 minor_version = 0
u2 major_version = 50
u2 constant_pool_count = 29
cp_info constant_pool[29] {
[ 0 ] [Unused]
[ 1 ] CONSTANT_Methodref_info {
u1 tag = 10
u2 class_index = #6 // "java/lang/Object"
u2 name_and_type_index = #15 // <init> ()V
}
[ 2 ] CONSTANT_Fieldref_info {
u1 tag = 9
u2 class_index = #16 // "java/lang/System"
u2 name_and_type_index = #17 // out Ljava/io/PrintStream;
}
[...]
user@home>
|
The full output was clipped for brievity, but is available
here if you want to browse through to get
an idea of what is displayed. The 2>/dev/null part of the
command is just to hide the python clutter that gets sent to stderr.
Things to notice here:
- Each structure is displayed using its name (ClassFile,
CONSTANT_Methodref_info), and a list of its fields
with the field values displayed inline. Primitive fields
(u4, u2) have their values displayed directly.
When there's additional meaning associated with a primitive field
value, the value is formatted, annotated, or sometimes replaced
by the meaning rather than just the raw byte values.
- Array values are displayed in typical array notation: the base
type of the array, the name of the field, and then brackets which
contain the number of elements. Each element in the array is
prefixed with its index in brackets, followed by the contents of
the element.
- In some places in the classfile, an array can only hold one
type of data. In most other places, though, the array is defined with
a base type, but actual contains instances of subclasses of that
base type. The base type is used in the field specification of the
array, but the actual subclasses are used as elements when appropriate.
- The hash (#) notation is used to indicate constant pool
index values.
Exploring Class Files
The common use of the program is to explore class file interactively
by starting with an overview and drilling down to the points of interest.
This is the default mode of the application. You can either preload
a number of classes by providing the class file locations (or class
names), or you can load files from inside the environment. Classes
that are automatically loaded will be assigned to a python variable
which matches the class name. A variable loaded also exists
which is a list that contains all the auto-loaded classfiles.
Classfiles which are loaded interactively are not assigned variables
nor do they get added to the loaded variable (but you can do
these things if you want to).
Automatically-loaded classes which are in the default package will
be assigned to simple name variables which match the class name.
Classes which are in packages will be accessible through their full
dotted-package name.
user@home> ./jallie Hello.class com/foo/Main.class
Loaded class: "Hello"
Loaded class: "com/foo/Main"
>>> print loaded
[ ClassFile(Hello), ClassFile(com/foo/Main) ]
>>> print com.foo.Main
ClassFile {
u4 magic = 0xcafebabe
u2 minor_version = 0
u2 major_version = 50
u2 constant_pool_count = 29
cp_info constant_pool[29] = { ... }
u2 access_flags = ( ACC_PUBLIC, ACC_SUPER )
u2 this_class = #5 // "com/foo/Main"
u2 super_class = #6 // "java/lang/Object"
u2 interfaces_count = 0
u2 interfaces[0] = { ... }
u2 fields_count = 0
field_info fields[0] = { ... }
u2 methods_count = 2
method_info methods[2] = { ... }
u2 attributes_count = 1
attribute_info attributes[1] = { ... }
}
|
Classes can also be loaded from a specific file on-the-fly, or
empty classes can be created:
>>> cf = ClassFile('Goodbye.class')
>>> print cf.access_flags
33
>>> empty = ClassFile()
>>> print empty
ClassFile {
u4 magic = 0xcafebabe
u2 minor_version = 0
u2 major_version = 0
u2 constant_pool_count = 1
cp_info constant_pool[1] = { ... }
u2 access_flags = ( )
u2 this_class = #0 // NULL
u2 super_class = #0 // NULL
u2 interfaces_count = 0
u2 interfaces[0] = { ... }
u2 fields_count = 0
field_info fields[0] = { ... }
u2 methods_count = 0
method_info methods[0] = { ... }
u2 attributes_count = 0
attribute_info attributes[0] = { ... }
}
>>>
|
By default, printing any structure or array will display fields
and primitive values, but will not recursively print substructures
(see Settings for details on how to change this).
Many primitive values are translated or annotated to help display
their meaning. However, they are still primitives which can be displayed
on their own and even mixed into expressions.
>>> print Hello.major_version
50
>>> print Hello.access_flags
33
>>> print Hello.magic
3405691582
>>> print Hello.minor_version + 4
4
>>>
|
The values displayed as { ... } indicate the field contains
more data but that data is ommitted from the current display. One can
print the specific field to see the details for that field. In many
cases, a summary can be printed instead of the elipsis. Even when a
summary is displayed, printing the field individually will display the
more detailed breakdown of data for that field. Array elements can
be accessed using the typical array syntax. Python actually provides
array slicing capabilities, and those can be used as well to display only
part of an array.
Drilling down for details:
>>> print Hello
ClassFile {
u4 magic = 0xcafebabe
u2 minor_version = 0
u2 major_version = 50
u2 constant_pool_count = 29
cp_info constant_pool[29] = { ... }
u2 access_flags = ( ACC_PUBLIC, ACC_SUPER )
u2 this_class = #5 // "Hello"
u2 super_class = #6 // "java/lang/Object"
u2 interfaces_count = 0
u2 interfaces[0] = { ... }
u2 fields_count = 0
field_info fields[0] = { ... }
u2 methods_count = 2
method_info methods[2] = { ... }
u2 attributes_count = 1
attribute_info attributes[1] = { ... }
}
>>> print Hello.methods
method_info[2] {
[ 0 ] method_info( "<init>", "()V" )
[ 1 ] method_info( "main", "([Ljava/lang/String;)V" )
}
>>> print Hello.methods[1]
method_info {
u2 access_flags = ( ACC_PUBLIC, ACC_STATIC )
u2 name_index = #11 // "main"
u2 descriptor_index = #12 // "([Ljava/lang/String;)V"
u2 attributes_count = 1
attribute_info attributes[1] = { ... }
}
>>> print Hello.methods[1].attributes
attribute_info[1] {
[ 0 ] Code_attribute: { ... }
}
>>> print Hello.methods[1].attributes[0]
Code_attribute {
u2 attribute_name_index = #9 // "Code"
u4 attribute_length = 37
u2 max_stack = 2
u2 max_locals = 1
u4 code_length = 9
bytecode code[4] = { ... }
u2 exception_table_length = 0
exception_table_entry exception_table[0] = { ... }
u2 attributes_count = 1
attribute_info attributes[1] = { ... }
}
>>> print Hello.methods[1].attributes[0].code
bytecode[4] {
[ 0 ] L0: getstatic #2 // java/lang/System.out Ljava/io/PrintStream;
[ 1 ] L3: ldc #3 // "Hello, world!"
[ 2 ] L5: invokevirtual #4 // java/io/PrintStream.println (Ljava/lang/String;)V
[ 3 ] L8: vreturn
}
>>>
|
Any internal structure is addressible and can be assigned to a local
variable for later referencing. This can save some typing when
exploring deep in the data structure. All assignments are reference
assignments, not copies. This is not important now, but is something
to keep in mind if you start changing fields. (Local variables need
no declaration -- their assignment creates them).
>>> code = Hello.methods[1].attributes[0].code
>>> print code
bytecode[4] {
[ 0 ] L0: getstatic #2 // java/lang/System.out Ljava/io/PrintStream;
[ 1 ] L3: ldc #3 // "Hello, world!"
[ 2 ] L5: invokevirtual #4 // java/io/PrintStream.println (Ljava/lang/String;)V
[ 3 ] L8: vreturn
}
>>> print code[2]
invokevirtual {
u1 opcode = 0xb6
u2 index = #4 // java/io/PrintStream.println (Ljava/lang/String;)V
}
|
There are a number of things one can do with any substructure reference
in the classfile. The classfile() method returns a reference
to the classfile which the reference is a part of. The size()
method can be used to find the on-disk size in bytes of the reference
and everything it contains. And the hex() method shows the
actual bytes in hex (in a format understood by the program xxd)
>>> print Hello.major_version.size()
2
>>> Hello.major_version.hex()
0000000: 0032
>>> print Hello.methods[1].size()
51
>>> Hello.methods[1].hex()
0000000: 0009 000b 000c 0001 0009 0000 0025 0002
0000010: 0001 0000 0009 b200 0212 03b6 0004 b100
0000020: 0000 0100 0a00 0000 0a00 0200 0000 0300
0000030: 0800 04
>>> Hello.methods[1].attributes[0].hex()
0000000: 0009 0000 0025 0002 0001 0000 0009 b200
0000010: 0212 03b6 0004 b100 0000 0100 0a00 0000
0000020: 0a00 0200 0000 0300 0800 04
>>>
|
Of course, calling hex() upon the classfile will get you the
entire binary file.
>>> Hello.hex()
0000000: cafe babe 0000 0032 001d 0a00 0600 0f09
0000010: 0010 0011 0800 120a 0013 0014 0700 1507
0000020: 0016 0100 063c 696e 6974 3e01 0003 2829
0000030: 5601 0004 436f 6465 0100 0f4c 696e 654e
0000040: 756d 6265 7254 6162 6c65 0100 046d 6169
0000050: 6e01 0016 285b 4c6a 6176 612f 6c61 6e67
0000060: 2f53 7472 696e 673b 2956 0100 0a53 6f75
0000070: 7263 6546 696c 6501 000a 4865 6c6c 6f2e
0000080: 6a61 7661 0c00 0700 0807 0017 0c00 1800
0000090: 1901 000d 4865 6c6c 6f2c 2077 6f72 6c64
00000a0: 2107 001a 0c00 1b00 1c01 0005 4865 6c6c
00000b0: 6f01 0010 6a61 7661 2f6c 616e 672f 4f62
00000c0: 6a65 6374 0100 106a 6176 612f 6c61 6e67
00000d0: 2f53 7973 7465 6d01 0003 6f75 7401 0015
00000e0: 4c6a 6176 612f 696f 2f50 7269 6e74 5374
00000f0: 7265 616d 3b01 0013 6a61 7661 2f69 6f2f
0000100: 5072 696e 7453 7472 6561 6d01 0007 7072
0000110: 696e 746c 6e01 0015 284c 6a61 7661 2f6c
0000120: 616e 672f 5374 7269 6e67 3b29 5600 2100
0000130: 0500 0600 0000 0000 0200 0100 0700 0800
0000140: 0100 0900 0000 1d00 0100 0100 0000 052a
0000150: b700 01b1 0000 0001 000a 0000 0006 0001
0000160: 0000 0001 0009 000b 000c 0001 0009 0000
0000170: 0025 0002 0001 0000 0009 b200 0212 03b6
0000180: 0004 b100 0000 0100 0a00 0000 0a00 0200
0000190: 0000 0300 0800 0400 0100 0d00 0000 0200
00001a0: 0e
>>>
|
Every reference also has a path() method which details the
path from the classfile to the reference.
>>> print Hello.constant_pool[16].name_index.path()
Hello.constant_pool[16].name_index
>>> print code.path()
Hello.methods[1].attributes[0].code
|
Modifying the Class Files
Changing a primitive values is usually as simple as typing in the
assignment statement.
>>> print Hello.major_version
50
>>> Hello.major_version = 49
>>> Hello.minor_version = 3
>>> print Hello
ClassFile {
u4 magic = 0xcafebabe
u2 minor_version = 3
u2 major_version = 49
u2 constant_pool_count = 29
cp_info constant_pool[29] = { ... }
u2 access_flags = ( ACC_PUBLIC, ACC_SUPER )
u2 this_class = #5 // "Hello"
u2 super_class = #6 // "java/lang/Object"
u2 interfaces_count = 0
u2 interfaces[0] = { ... }
u2 fields_count = 0
field_info fields[0] = { ... }
u2 methods_count = 2
method_info methods[2] = { ... }
u2 attributes_count = 1
attribute_info attributes[1] = { ... }
}
>>>
|
For access flags, the ACC_<flag> variables are defined
and can be used to set the access flag bitfields. Use the
or, and, or not bit operations to set the
appropriate fields. There are similar type definitions for the
T_<type> constants used in the newarray
instruction.
>>> print ACC_PROTECTED
4
>>> Hello.access_flags = ACC_SUPER | ACC_PROTECTED
>>> print Hello.access_flags
36
>>> print Hello
ClassFile {
u4 magic = 0xcafebabe
u2 minor_version = 3
u2 major_version = 49
u2 constant_pool_count = 29
cp_info constant_pool[29] = { ... }
u2 access_flags = ( ACC_PROTECTED, ACC_SUPER )
u2 this_class = #5 // "Hello"
u2 super_class = #6 // "java/lang/Object"
u2 interfaces_count = 0
u2 interfaces[0] = { ... }
u2 fields_count = 0
field_info fields[0] = { ... }
u2 methods_count = 2
method_info methods[2] = { ... }
u2 attributes_count = 1
attribute_info attributes[1] = { ... }
}
>>>
|
Some fields it does not make sense to change, such as the Java
signature (the magic field). Also, there are many instances in
the classfile where a length field deliniates the length of the following
array. The is crucial in reading in the array, but once loaded the
arrays are represented as python lists which have an implicit length.
For this reason, the data structure mechanisms simply use the array
length to update the length field dynamically so there's no need to
change the field yourself. Change the array instead and the field
will be automatically updated. This is enforced if you try to write
to an automatically-set length field (this goes for attribute lengths
as well).
>>> Hello.magic = 0x8badbeef
Error performing assignment: Field is read-only
>>> Hello.constant_pool_count = 10
Error performing assignment: Field is a length field (change the array instead)
>>> Hello.interfaces_count = 4
Error performing assignment: Field is a length field (change the array instead)
>>> Hello.interfaces += [ 4, 5, 6, 7 ]
>>> print Hello.interfaces_count
4
|
Similarly, some fields are discriminator values which determine the
type of the structure in arrays which have variable-typed contents.
The tag field in cp_info, for instance is an instance
of this. Since the actual data associated with the structure is the
type, the tag value is redundant after the data has been parsed. It
is displayed for completeness, but is read-only. If you want a different
discriminator value, replace the entire array entry with the new type,
rather than trying to change the value.
>>> print Hello.constant_pool[9]
CONSTANT_Utf8_info {
u1 tag = 1
u2 length = 4
u1 bytes[] = "Code"
}
>>> Hello.constant_pool[9].tag = 7
Error performing assignment: Can't change cp_info tag
>>> Hello.constant_pool[9] = CONSTANT_Class_info()
>>> print Hello.constant_pool[9]
CONSTANT_Class_info {
u1 tag = 7
u2 name_index = #0 // NULL
}
|
When making changes to bytecode which change the size, jallie attempts to
adjust the affected code offsets in the classfile, such as the inter-code
bci offsets in branch instructions, the exception handler table, and the
LocalVariableTable, LocalVariableTypeTable,
LineNumberTable, and StackMapTable. It would be wise to
second-check these values after modifying bytecode to ensure they were
adjusted correctly.
Saving Classes
Once you've made modifications to the classfile, you can write the modified
bytes back to a classfile for later use using the write() method.
You can write to the default file name, or specify a specific file to write to.
If no extension is given, a '.class' extension is assumed.
>>> Hello.write()
Wrote 481 bytes to file 'Hello.class'
>>> Hello.major_version = 49
>>> Hello.minor_version = 3
>>> Hello.write('Hello_v49')
Wrote 481 bytes to file 'Hello_v49.class'
>>> Hello.write('bytes.bin')
Wrote 481 bytes to file 'bytes.bin'
|
Settings
There are a few user-controlable settings that can be used to change the default
behavior. These settings are contained in the Settings class and can be
assigned to at any time.
Value | Default | Description |
extraDetailLevels | 0 | The number of additional levels of detail to display |
indent | ' ' | What is used as the indent string |
verbose | True | Verbosity-level. Set to False to reduce system messages |
For example:
>>> Settings.extraDetailLevels=2
>>> Settings.indent='"""'
>>> print Hello.methods[1]
method_info {
"""u2 access_flags = ( ACC_PUBLIC, ACC_STATIC )
"""u2 name_index = #11 // "main"
"""u2 descriptor_index = #12 // "([Ljava/lang/String;)V"
"""u2 attributes_count = 1
"""attribute_info attributes[1] {
""""""[ 0 ] attribute_info {
"""""""""u2 attribute_name_index = #9 // "Code"
"""""""""u4 attribute_length = 37
"""""""""u2 max_stack = 2
"""""""""u2 max_locals = 1
"""""""""u4 code_length = 9
"""""""""bytecode code[4] = { ... }
"""""""""u2 exception_table_length = 0
"""""""""exception_table_entry exception_table[0] = { ... }
"""""""""u2 attributes_count = 1
"""""""""attribute_info attributes[1] = { ... }
""""""}
"""}
}
|
Scripting
Jallie can also be used to bulk-process classfiles. This can be used for summarizing,
searching, or algorithmically modifying classfiles. To use the processing capability,
create a module with a global method named for_ClassFile which takes two arguments,
a ClassFile object, and an optional pass-in command-line parameter.
When the module name is passed to the --processor flag, instead of entering
interactive mode, jallie will parse each classfile in it's command-list and call the
for_ClassFile method with each classfile as an argument. If a --processor_arg
value is specified on the command line, that is passed as the scond argument to for_ClassFile
For example:
> cat version.py
def for_ClassFile(cf, arg):
print '%s %s: %d.%d: num methods: %d' % \
( arg, cf.classname(), cf.major_version, cf.minor_version, len(cf.methods) )
> jallie -q --processor=version --processor_arg=asdf Hello LockExample 2>/dev/null
asdf Hello: 51.0: num methods: 2
asdf LockExample: 49.0: num methods: 4
|
Hosted by:
|