This document explains the general principles of using the JNI to link C libraries and the specific requirements for linking to HDF4 and HDF5. The JNI specification and code is standard for all platforms, but compiling and runtime linking are different for each virtual machine and platform. These issues are complex and not well documented.
Package | Description |
Java HDF Interface (JHI) | Classes and JNI code to access the HDF4 library. |
Java HDF5 Interface (JHI5) | Classes and JNI code to access the HDF5 library. |
It is extremely important to emphasize that these packages are not a pure Java implementation of the HDF libraries. The JHI or JHI5 calls the same HDF libraries that are used by C or FORTRAN programs.
Each Java HDF Interface consists of Java classes and dynamically linked native libraries. The Java classes declare static native methods, and the library contains C functions which implement the native methods. The C functions call the standard HDF library, which is linked as part of the same library on most platforms.
Each Java HDF Interface is intended to be the standard interface for accessing HDF files from Java programs. The JHI is a foundation upon which application classes can be built, such as the hdfobject package.
The Java language defines its own virtual machine and instruction set. This design makes Java easy to program and very portable. However, Java programs cannot use code written in any language except Java. In order to provide a minimal level of interoperability with other languages, the Java Native Interface (JNI) defines a standard for invoking C and C++ subroutines as Java methods (declared 'native'). The Java Virtual Machine (JVM) implements a platform-specific mechanism to load C or C++ libraries, and to call the native methods on behalf of the Java programs.
So, to access a C library from a Java program, four tasks must be completed:
When the Java program runs, the JNI glue and C library must be in the 'java.library.path' path.
The JNI standard defines the syntax for the Java native methods, and the
C and C++ for the glue layer. This standard is universal, the same code should
work on any platform.
Compiling the JNI is specific to platform, compiler, and Java Virtual Machine. The challenge is to create a C library that can be dynamically loaded by the JVM. On each platform, specific flags must be used with the C or C++ compiler, to create a library that the Java VM can successfully load. The same C code is used on all platforms, but the build process is different for each platform.
The rules for deployment are roughly standard across all platforms, i.e., the JNI libraries must be in the library load path for the Java application. However, the precise behavior of the Java VM is not standardized and, indeed, varies across platforms, and even different versions of a JVM on the same platform.
First, the JVM is a multi-threaded program. The C code in a C library will be executed in a thread, with multiple threads active. The C code must be thread safe, and in some cases may need to synchronize to prevent disastrous bugs. However, the precise requirements on the C code depend on the implementation of the specific Java VM. Not only is there no general specification for the C code, the thread-safety requirements are different for different Java VMs--and in most cases there is no way to know what the C code must do.
Second, the JNI layer is vulnerable to errors because neither Java nor C can check the correctness of Java-to-C interface. Also, any errors in the C code can create grave and mysterious crashes in the JVM. In particular, if the C code overwrites memory that it shouldn't, it can corrupt the state of the JVM, producing indecipherable errors in the Java program.
Third, the JVM dynamically links to C or C++ libraries. This process is similar to the way that dynamic libraries are implemented for C programs, but the Java VM does not necessarily follow the same rules as the C runtime. The precise requirements on the library and behavior of the loader is specific to the JVM, and different platforms exhibit different behavior. Furthermore, in order to be used throughout the JNI, the C library (and all its dependencies) must be compiled and linked with very specific C compiler and linker settings.
The native library is linked when a native method is invoked (i.e., on demand) or when the library is specifically invoked (i.e., the first access to the hdf.hdflib.HDFLibrary or hdf.hdf5lib.H5 class) with a call to System.loadLibrary("hdf") or System.loadLibrary("hdf5"). If successful, the link step is invisible to the application and user.
This
link step will fail if:
If the link fails, the application will receive an exception, such as java.lang.UnsatisfiedLinkError or
java.lang.NoClassDefFoundError.
Note that it may be difficult to determine the cause of the link failure: it might be a path problem, it might be a code generation problem (e.g., incompatible settings for libhdf.so), or it might be a missing or mis-matched symbol.
The JVM usually does not implement dependency checking and dynamic loading for the C libraries that it loads. Specifically, if the JNI library has unresolved symbols, e.g., to another dynamic library, the JVM may or may not load that library. And if multiple C libraries with mutual dependencies are manually loaded into the JVM, they may or may not be linked to resolve the references in the C code. In most JVMs, these dependencies are not implemented.
Also, some JVMs check all references at load time, rejecting the library if any symbols are unresolved. Other JVMs will load a library with unresolved symbols, and fail only when one of the missing symbols is referenced. In other words, the linking exception may occur differently with different JVMs.
Furthermore, the HDF Native Interface is not available on some platforms because the technical mysteries have not been successfully resolved.
The HDF5 library has a "thread safe" option. This option is irrelevant for the JNI and should not be used. The HDF5 thread safe option depends on P-threads, but the JVM uses it's own thread package, which usually isn't p-threads. More important, it is impossible to correctly synchronize from inside the HDF5 library.
Following the JNI specification, the recommended procedure is to use Java synchronization to create a monitor to serialize access to the library. This is done in the Java code by declaring the methods 'synchronized'. This design essentially puts a single lock around the HDF library. This is equivalent to the HDF5 thread safe option, except it is implemented by the JVM at a higher level in the call stack.
The overriding principle in this layer is that Java uses objects, and C uses
memory structures. The memory layout of a Java object cannot be accessed
through Java or the JNI. The translation can only be done by calling special C routines
that return C data structures.
The HDF JNI layer is basically a series of calls that:
These data copies may be very significant for the HDF native interface: when reading and writing large blocks of data, the data array may be in memory twice (in a Java object and in a C array), and data may be copied between the two copies. This can be a very serious performance penalty.
The C library returns error codes. The errors are detected in the JNI
layer and Java exceptions are raised. This behavior is very natural. The
calling Java program must surround the calls to the HDF libraries with appropriate
try/catch expressions.
The HDF libraries define a large number of constants to be used as parameters
to the API. The HDF JNI layer exports a set of Java constants that are mapped
to the required C values. The Java program calls the native method hdf.hdf5lib.J2C
to get the correct value for the C constants.
Similarly, HDF5 is accessed through hdf.hdf5lib.H5, which loads the HDF5 JNI:
Note that it does not work to build with shared libraries: libhdf.so, etc..
The reason is that the file loaded by the JVM (libjhdf.so) must have all the
symbols resolved, it cannot require dynamic linking to other libraries. (In
other words, you have to build a dynamic library that does not depend on
any other dynamic libraries.
The Java HDF has two parts, the Java classes and the C libraries. The Java classes are executed with a Java VM, they must be found in the CLASSPATH just as any Java classes. The C libraries are loaded when the Java classes for the JNI are initialized. The C libraries must be in the library search path for the JVM, e.g., LD_LIBRARY_PATH.
The most common problem is difficulty running the code after compilation. This is usually caused by problems with the class path or library path. When one of the jar files is missing from the class path, the program will encounter a "Class not found" exception. An "unsatisfied link" error will occur if the library with the native implementation is not found in the LD_LIBRARY_PATH.