Sending JNI to reform school
Listened to the discussion about JNI on the JavaPosse today, 2-Jun-06. (Incidentally, I'm using the Posse as a means of making myself blog. I send them an email on their topic, then I expand it to be a blog entry. This seems to be working in terms of making me do the writing, so I think I'm going to try to keep it up.)
Anyway, it was a good discussion on the podcast, but there was not a lot of meat on any of the various ways in which JNI could be reformed to be more usable, just the suggestion that it should be possible. I think I have some ideas and that's what this post is about.
This topic is especially interesting to me because for the past several years at work, I've had to deal with this one 3rd party package that is unmanaged native code. In fact, it is the worst code I've ever seen. AFAICT, it was written in C in the either late 80's or early '90s to support a very particular business which has absolutely mushroomed over the years. So this ancient code is supporting some complex products that had not even been invented when it was written and it's grown into a real mess. To give you the flavor of what I'm talking about here, the entire API is contained in one header file that is more than 100 pages long when printed. All access to the API begins with one struct definition which takes up probably 10 of those pages and which wraps arrays of pointers to the structs and #define's which make up the rest. In fairness to the firm which makes this thing, they do wrap this API with another simplified API that most people use. Unfortunately, my particular use requires dealing with the nastiness of the original.
Following the standard Java industry practice, I wrapped all of this code in a JNI layer. It's ugly, it's impossible to debug and getting ant to build it correctly on both Windows and Linux is like having a root canal. Having spent time with JNI before, I just thought that there was nothing that could be done to fix all this. But I now know that I was wrong. JNI takes a very naive, brute force approach to the problem of native code and there are better ways.
Post-my-recent-job-change, I have had to use the same library, only doing the wrapping and invocation from C# running in the CLR (this library is a standard in my industry and understanding it at a deep level makes one extremely valuable in certain organizations). For this particular problem (accessing native code from managed code) Microsoft has addressed the issue in a much nicer way than Java does with JNI. Basically they have done a language which looks a lot like C++ but which executes inside the CLR. They call this C++.NET, presumably because its _NOT_ C++, it only looks a lot like it.
Here're the big differences between C++.NET and standard C++ that I have seen:
a) no multiple inheritance, C++.NET uses single inheritance and interfaces since, like the JVM, that's what the CLR supports.
b) GC operates on C++.NET objects, but keywords have been added to defeat GC where necessary (like when you have a pointer to a C struct in native code).
c) Loads of new semantics necessary to deal with a and b.
d) C++.NET objects and their methods are directly accessible from C# and VB in much the same way that Java and Groovy objects can directly communicate, i.e. you just make the call directly without any intermediate dispatch.
So why is this better than JNI? First, because you write managed code in one language only, rather than having your code split between two languages (with one of them name-mangled) as with JNI. C++.NET directly parses and imports the headers (or rather header in this case) for the native code and it allows you to access the functions defined there directly in the managed language. Second, writing tests becomes _way_ easier, as you can test your code directly rather than having to go through the name mangled interface. Third, the standard debuggers work over your code which exercises the external native library (they don't work over the third party code since that has no debug symbols anyway, but that's something you can live with). And fourth, all of this naturally fits into your IDE. If you have ever had to debug JNI code, you know exactly what I'm talking about here.
Now, the C++.NET language is ugly, but so are C and C++, so MS's solution just seems reasonable. This is especially true if you add similar functionality to the JVM when you consider that:
a) the Sun tools group that does C++ is moving to the NetBeans platform,
b) Sun has a new emphasis on other languages running in the JVM and
c) .NET _is_ the competition for Java after all.
Someone from Sun recently made the remark to me that internally Sun is divided into "C people" and "Java people" and that this is not an official division, but rather a cultural one. So, I'd like to see Sun work on a similar language to C++.NET as one way of ending that cultural divide. They have the skills internally, they have the tools and they have the ability to experiment with JVM enhancements that could support this. And they're behind on this particular technology which, IMHO, will be needed to move the world out of running unmanaged code.
Anyway, it was a good discussion on the podcast, but there was not a lot of meat on any of the various ways in which JNI could be reformed to be more usable, just the suggestion that it should be possible. I think I have some ideas and that's what this post is about.
This topic is especially interesting to me because for the past several years at work, I've had to deal with this one 3rd party package that is unmanaged native code. In fact, it is the worst code I've ever seen. AFAICT, it was written in C in the either late 80's or early '90s to support a very particular business which has absolutely mushroomed over the years. So this ancient code is supporting some complex products that had not even been invented when it was written and it's grown into a real mess. To give you the flavor of what I'm talking about here, the entire API is contained in one header file that is more than 100 pages long when printed. All access to the API begins with one struct definition which takes up probably 10 of those pages and which wraps arrays of pointers to the structs and #define's which make up the rest. In fairness to the firm which makes this thing, they do wrap this API with another simplified API that most people use. Unfortunately, my particular use requires dealing with the nastiness of the original.
Following the standard Java industry practice, I wrapped all of this code in a JNI layer. It's ugly, it's impossible to debug and getting ant to build it correctly on both Windows and Linux is like having a root canal. Having spent time with JNI before, I just thought that there was nothing that could be done to fix all this. But I now know that I was wrong. JNI takes a very naive, brute force approach to the problem of native code and there are better ways.
Post-my-recent-job-change, I have had to use the same library, only doing the wrapping and invocation from C# running in the CLR (this library is a standard in my industry and understanding it at a deep level makes one extremely valuable in certain organizations). For this particular problem (accessing native code from managed code) Microsoft has addressed the issue in a much nicer way than Java does with JNI. Basically they have done a language which looks a lot like C++ but which executes inside the CLR. They call this C++.NET, presumably because its _NOT_ C++, it only looks a lot like it.
Here're the big differences between C++.NET and standard C++ that I have seen:
a) no multiple inheritance, C++.NET uses single inheritance and interfaces since, like the JVM, that's what the CLR supports.
b) GC operates on C++.NET objects, but keywords have been added to defeat GC where necessary (like when you have a pointer to a C struct in native code).
c) Loads of new semantics necessary to deal with a and b.
d) C++.NET objects and their methods are directly accessible from C# and VB in much the same way that Java and Groovy objects can directly communicate, i.e. you just make the call directly without any intermediate dispatch.
So why is this better than JNI? First, because you write managed code in one language only, rather than having your code split between two languages (with one of them name-mangled) as with JNI. C++.NET directly parses and imports the headers (or rather header in this case) for the native code and it allows you to access the functions defined there directly in the managed language. Second, writing tests becomes _way_ easier, as you can test your code directly rather than having to go through the name mangled interface. Third, the standard debuggers work over your code which exercises the external native library (they don't work over the third party code since that has no debug symbols anyway, but that's something you can live with). And fourth, all of this naturally fits into your IDE. If you have ever had to debug JNI code, you know exactly what I'm talking about here.
Now, the C++.NET language is ugly, but so are C and C++, so MS's solution just seems reasonable. This is especially true if you add similar functionality to the JVM when you consider that:
a) the Sun tools group that does C++ is moving to the NetBeans platform,
b) Sun has a new emphasis on other languages running in the JVM and
c) .NET _is_ the competition for Java after all.
Someone from Sun recently made the remark to me that internally Sun is divided into "C people" and "Java people" and that this is not an official division, but rather a cultural one. So, I'd like to see Sun work on a similar language to C++.NET as one way of ending that cultural divide. They have the skills internally, they have the tools and they have the ability to experiment with JVM enhancements that could support this. And they're behind on this particular technology which, IMHO, will be needed to move the world out of running unmanaged code.

0 Comments:
Post a Comment
<< Home