<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jonisalonen.com &#187; Java</title>
	<atom:link href="http://jonisalonen.com/category/programming/java/feed/" rel="self" type="application/rss+xml" />
	<link>http://jonisalonen.com</link>
	<description>Articles on computing, mathematics, and anything in between</description>
	<lastBuildDate>Wed, 22 May 2013 12:21:45 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Java and File Names With Invalid UTF-8</title>
		<link>http://jonisalonen.com/2012/java-and-file-names-with-invalid-characters/</link>
		<comments>http://jonisalonen.com/2012/java-and-file-names-with-invalid-characters/#comments</comments>
		<pubDate>Fri, 24 Aug 2012 19:06:04 +0000</pubDate>
		<dc:creator>Joni</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Unicode]]></category>

		<guid isPermaLink="false">http://jonisalonen.com/?p=416</guid>
		<description><![CDATA[On Unix systems &#8211; Linux and OS X included &#8211; file names can be arbitrary binary data with very few limitations. This means that in order to make sense of the name a character encoding must be used. Recently UTF-8 has become the default encoding on many systems, but sometimes you have to deal with [...]]]></description>
				<content:encoded><![CDATA[<p>On Unix systems &#8211; Linux and OS X included &#8211; file names can be arbitrary binary data with very few limitations. This means that in order to make sense of the name a character encoding must be used. Recently UTF-8 has become the default encoding on many systems, but sometimes you have to deal with files originating from older systems with names in other encodings. These files are a problem for Java programs because <code>java.io</code> treats file names as strings of unicode characters rather than bytes, and is unable to open files with with incorrectly encoded names.</p>
<h3>Example</h3>
<p>This Java program lists files from the current directory and tells you if they exist. It demonstrates that when Java encounters a file with a problematic name it does report it in <code>listFiles</code>, but any further operations on the file fail.</p>
<pre>
import java.io.File;
import java.io.IOException;

class Ls {
    public static void main(String[] args) throws IOException {
        File d = new File(".");
        for (File f : d.listFiles()) {
            System.out.printf("%s: %b\n", f.getName(), f.exists());
        }
    }
}
</pre>
<p>For example, when it encounters a file with a name encoded in latin1, this is what happens: </p>
<pre>
$ ls -b
ni\361o
$ java Ls
ni�: false
</pre>
<p>You can <a href='http://jonisalonen.com/wp-content/uploads/ls.tar'>download Ls.java with an example file</a> here.</p>
<h3>Setting the default character encoding</h3>
<p>You probably know that Java uses a &#8220;default character encoding&#8221; to convert binary data to <code>String</code>s. To read or write text using another encoding you can use an <code>InputStreamReader</code> or <code>OutputStreamWriter</code>. But for data-to-text conversions deep in the API you have no choice but to change the default encoding.</p>
<p>Java reads the default character encoding from the system language settings. On Unix this means <code>LANG</code> and <code>LC_CTYPE</code> environment variables; changing one of these is sufficient. For example, to make Java use latin1 you could start the JVM with the following command:</p>
<pre>$ LANG=en_US.iso88591 java Ls
ni�o: true
</pre>
<p>Or, if you want all programs you start from the terminal to use this locale:</p>
<pre>export LANG=en_US.iso88591</pre>
<p>The locale <code>en_US.iso88591</code> has to be installed on the system for these to work, though. You can use the following command to list locales that are available on your system.</p>
<pre>locale -a</pre>
<h3>Defining and installing a new locale</h3>
<p>If you don&#8217;t have a locale with the appropriate encoding installed you can define and install a new one with the <code>localedef</code> program. For example, to create locale with the <a href="http://en.wikipedia.org/wiki/Windows-1252">Windows Western</a> character encoding you could use the following command.</p>
<pre>sudo localedef -f CP1252 -i en_US en_US.cp1252</pre>
<p>Under this locale Java would correctly process files with all kinds of names, including those whose name contains curly quotes or the euro character €.</p>
<h3>What about <code>file.encoding</code>?</h3>
<p>The <code>file.encoding</code> system property can also be used to set the default character encoding that Java uses for I/O. Unfortunately it seems to have no effect on how file names are decoded into <code>String</code>s.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonisalonen.com/2012/java-and-file-names-with-invalid-characters/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Calling C From Java Is Easy</title>
		<link>http://jonisalonen.com/2012/calling-c-from-java-is-easy/</link>
		<comments>http://jonisalonen.com/2012/calling-c-from-java-is-easy/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 07:00:25 +0000</pubDate>
		<dc:creator>Joni</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[JNI]]></category>
		<category><![CDATA[Tutorials]]></category>

		<guid isPermaLink="false">http://jonisalonen.com/?p=282</guid>
		<description><![CDATA[Sometimes we need to access operating system functions that the standard Java API doesn&#8217;t expose, or use non-Java libraries. Although it&#8217;s well known that you can call this &#8220;native code&#8221; from Java using JNI, there is not so much entry-level material on how it&#8217;s actually done. It is often left out of introductory material&#8211;including the official [...]]]></description>
				<content:encoded><![CDATA[<p>Sometimes we need to access operating system functions that the standard Java API doesn&#8217;t expose, or use non-Java libraries. Although it&#8217;s well known that you can call this &#8220;native code&#8221; from Java using JNI, there is not so much entry-level material on how it&#8217;s actually done. It is often left out of introductory material&#8211;including the official Java Tutorial. Here I hope to give a short introduction to get you started.</p>
<p>Programming in JNI starts with defining a class with methods declared as <code>native</code>. Next you generate a C header file that declares functions that implement. Then you define these functions in a separate file and compile it into a shared library. In this tutorial we&#8217;ll use Linux and the GCC compiler.</p>
<p>Suppose we need to know the name of the terminal device from which the JVM was launched. (A slightly silly example since <a title="Runtime.exec with Unix console programs" href="http://jonisalonen.com/2012/runtime-exec-with-unix-console-programs/">you could just use <code>/dev/tty</code></a>.) From C you would do this by calling the POSIX functions <code>isatty</code> and <code>ttyname</code>. Let&#8217;s make a class that gives us access to them:</p>
<pre>package ex;
public class TTYUtil {
    static { System.loadLibrary("ttyutil"); }
    public static native boolean isTTY();
    public static native String getTTYName();
}</pre>
<p>The call to <code>System.loadLibrary</code> in the class initializer looks for a shared library and links it to the JVM. The file name depends on the operating system: on Windows this code would look for <code>ttyutil.dll</code>, on Linux and Solaris <code>libttyutil.so</code>.</p>
<p>The next step is compiling the Java code and generating the C header file:</p>
<pre>$ javac ex/TTYUtil.java
$ javah ex.TTYUtil</pre>
<p>The <code>javah</code> tool will create a C header file called <code>ex_TTYUtil.h</code>. This file contains the declarations for the C functions we have to define:</p>
<pre>JNIEXPORT jboolean JNICALL Java_ex_TTYUtil_isTTY
  (JNIEnv *, jclass);

JNIEXPORT jstring JNICALL Java_ex_TTYUtil_getTTYName
  (JNIEnv *, jclass);</pre>
<p>The idea is that the header file should be generated automatically during the build process so you should not modify it manually. If you need other declarations or <code>#include</code> statements you should use a different file.</p>
<p>Create <code>ex_TTYUtil.c</code> to define the C functions:</p>
<pre>#include "ex_TTYUtil.h"
#include &lt;unistd.h&gt;

JNIEXPORT jstring JNICALL Java_ex_TTYUtil_getTTYName
  (JNIEnv *env, jclass cls)
{
    char *name = ttyname(STDOUT_FILENO);
    return (*env)-&gt;NewStringUTF(env, name);
}

JNIEXPORT jboolean JNICALL Java_ex_TTYUtil_isTTY
  (JNIEnv *env, jclass cls)
{
    return isatty(STDOUT_FILENO)? JNI_TRUE: JNI_FALSE;
}</pre>
<p>These functions receive the JNI environment object as the first argument. The second argument is the class for static native methods and the object for non-static methods. The rest of the arguments, if any, are the method arguments from Java. The JNI environment is used for interacting with the virtual machine, like here the <code>NewStringUTF</code> function is used to create a new Java <code>String</code> object from a C string.</p>
<p>To compile and link the C code you can use:</p>
<pre>$ gcc -fPIC -c ex_TTYUtil.c -I $JAVA_HOME/include
$ gcc ex_TTYUtil.o -shared -o libttyutil.so -Wl,-soname,ttyutil</pre>
<p>Now you should have <code>libttyutil.so</code> in your working directory. Let&#8217;s try using the library.</p>
<pre>import ex.TTYUtil;
public class Test {
    public static void main(String[] args) {
        if (TTYUtil.isTTY()) {
            System.out.println("TTY: "+TTYUtil.getTTYName());
        } else {
            System.out.println("Not a TTY");
        }
    }
}</pre>
<p>Compile this class like you normally would and then run it:</p>
<pre>$ export LD_LIBRARY_PATH=.
$ java Test
TTY: /dev/pts/3</pre>
<p>And what if the output is not connected to a terminal?</p>
<pre>$ java Test | cat
Not a TTY</pre>
<p>The JVM looks for native libraries in the paths specified in the system property <code>java.library.path</code> in addition to what&#8217;s normal for the operating system. Here we made the shared library available to the JVM temporarily by adding the current directory to <code>LD_LIBRARY_PATH</code>. To permanently install a shared library on Linux you would copy the <code>.so</code> to <code>/usr/lib</code> (or any other directory mentioned in <code>/etc/ld.so.conf</code>) and then run <code>ldconfig</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonisalonen.com/2012/calling-c-from-java-is-easy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Runtime.exec with Unix console programs</title>
		<link>http://jonisalonen.com/2012/runtime-exec-with-unix-console-programs/</link>
		<comments>http://jonisalonen.com/2012/runtime-exec-with-unix-console-programs/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 07:00:36 +0000</pubDate>
		<dc:creator>Joni</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Console]]></category>
		<category><![CDATA[Linux tips]]></category>
		<category><![CDATA[Runtime.exec]]></category>
		<category><![CDATA[Sample code]]></category>

		<guid isPermaLink="false">http://jonisalonen.com/?p=274</guid>
		<description><![CDATA[Ever wanted to launch less or vi from a console Java program to show or edit a file, only to find that they won&#8217;t work like when you launch them from the terminal? The problem is that these programs need to communicate with a TTY (teletypewriter) device to find the screen size and to be [...]]]></description>
				<content:encoded><![CDATA[<p>Ever wanted to launch <code>less</code> or <code>vi</code> from a console Java program to show or edit a file, only to find that they won&#8217;t work like when you launch them from the terminal?</p>
<p>The problem is that these programs need to communicate with a TTY (teletypewriter) device to find the screen size and to be able to write anywhere on the terminal window. You don&#8217;t notice this when using them from a terminal because the shell sets up their stdin and stdout so they are connected to the terminal.</p>
<p>When you launch a program from Java using <code>Runtime.exec</code> the <code>stdin</code> and <code>stdout</code> are connected to <em>pipes</em> handled by the JVM, not to a TTY device: it is as if you tried to launch less with something like <code>less file.txt &lt;jvm.in &gt;jvm.out</code>. Needless to say that wouldn&#8217;t work even from a terminal.</p>
<p>What you can do is redirect the <code>stdin</code> and <code>stdout</code> streams <em>back</em> to the original terminal device. To find the actual TTY device we would have to call the POSIX <code>ttyname</code> function with JNI, but luckily that&#8217;s not necessary: we can use <code>/dev/tty</code>, which is the <a href="http://tldp.org/HOWTO/Text-Terminal-HOWTO-7.html#ss7.3">controlling terminal for the current process</a>.</p>
<p>An interesting application of this is to use <code>less</code> as a pager to show lengthy messages to a user, like database result sets:
<pre>
import java.io.OutputStream;

public class Test {
    public static void main(String[] args) throws Exception {
        Process p = Runtime.getRuntime().exec(new String[] {"sh", "-c",
                "less &gt;/dev/tty"});
        OutputStream out = p.getOutputStream();
        out.write("Lengthy message".getBytes());
        out.close();
        System.out.println("=&gt; "+p.waitFor());
    }
}</pre>
]]></content:encoded>
			<wfw:commentRss>http://jonisalonen.com/2012/runtime-exec-with-unix-console-programs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The JVM ate my variable!</title>
		<link>http://jonisalonen.com/2012/can-your-interpreter-do-this/</link>
		<comments>http://jonisalonen.com/2012/can-your-interpreter-do-this/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 11:00:49 +0000</pubDate>
		<dc:creator>Joni</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[GC]]></category>
		<category><![CDATA[Hotspot]]></category>
		<category><![CDATA[JIT]]></category>
		<category><![CDATA[OSR]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://jonisalonen.com/?p=234</guid>
		<description><![CDATA[Consider this code: Object obj = new Object(); WeakReference&#60;Object&#62; ref = new WeakReference&#60;Object&#62;(obj); List&#60;byte[]&#62; filler = new LinkedList&#60;byte[]&#62;(); while (ref.get() != null) { filler.add(new byte[1000]); } System.out.println("Filler size " + filler.size()); When executed it should always run out of memory: ref.get() will not return null because the referenced object cannot be garbage collected. The local [...]]]></description>
				<content:encoded><![CDATA[<p>Consider this code:</p>
<pre>Object obj = new Object();
WeakReference&lt;Object&gt; ref = new WeakReference&lt;Object&gt;(obj);

List&lt;byte[]&gt; filler = new LinkedList&lt;byte[]&gt;();
while (ref.get() != null) {
    filler.add(new byte[1000]);
}
System.out.println("Filler size " + filler.size());</pre>
<p>When executed it should always run out of memory: <code>ref.get()</code> will not return null because the referenced object cannot be garbage collected. The local variable <code>obj</code> still holds a reference to it. The filler increases in size indefinitely, and the program will ultimately crash with an OutOfMemoryError.</p>
<p>What really happens is this:</p>
<pre>$ java test
Filler size 28186</pre>
<p>Wait, what?!</p>
<p>It turns out that the Hotspot virtual machine analyzes the code, sees that the variable <code>obj</code> is not used after the <code>while</code>-loop, and rewrites the method <em>while it is running</em> so that the local variable is effectively removed. This is called <strong>OSR (On Stack Replacement)</strong> compilation in the Hotspot VM. If you added something like <code>print(obj)</code> after the loop you would get the expected OOME. Quoting Kris Mok in <a href="https://gist.github.com/1165804#file_notes.md">About Printcompilation</a>:</p>
<blockquote><p>OSR in HotSpot is used to help improve performance of Java methods stuck in loops [6]. Without OSR, a method running in the interpreter can&#8217;t transfer to its compiled version even if there is one available, until the next time this method is invoked. With OSR, though, a Java method with long-running loops can run in the interpreter, trigger an OSR compilation in one of its loops, keep running in the interpreter until the compilation completes, and jump right into the compiled version without having to wait for &#8220;the next invocation&#8221;.</p></blockquote>
<p>The process of &#8220;jumping right into the compiled version&#8221; may sound simple but in reality is anything but. The new method body does not start from the beginning but from the &#8220;back edge&#8221; of the running loop. The stack frame created by the interpreter is replaced by the one created by the JIT compiler. It is this process that is capable of removing local variables from the method.</p>
<p><a href="http://java.sun.com/docs/books/jls/third_edition/html/execution.html#12.6.1">JLS 12.6.1</a> especially allows this:</p>
<blockquote><p>Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.</p></blockquote>
<p>This behaviour is really unreliable: the thresholds for compiling code depend on the hardware and VM options. If you make the filler grow faster by adding bigger arrays of bytes you get you get an OOME: you have to give the compiler enough loop iterations to be triggered.</p>
<p>On-Stack-Replacement can affect benchmarks in unexpected ways: the code you are running changes during the test. <a href="http://www.azulsystems.com/blog/cliff/2011-11-22-what-the-heck-is-osr-and-why-is-it-bad-or-good">Cliff Click</a> has written on this subject.</p>
<p>It also influences finalizers: Since <code>obj</code> is a local variable you would expect that its finalizer would not be called at least until after the method. But since the object is GC&#8217;d before the method ends, it is finalized as well. The lesson is <strong>you can&#8217;t depend <strong>in <em>any</em> way </strong>on when objects are finalized</strong> &#8211; it may be <em>sooner</em> than you think! </p>
<p>If you are wondering if you can see when OSR happens, you can use the <code>-XX:+PrintCompilation</code> flag:</p>
<pre>$ java -XX:+PrintCompilation Test
    125   1       java.lang.String::hashCode (60 bytes)
    133   2       sun.nio.cs.UTF_8$Encoder::encodeArrayLoop (490 bytes)
    158   3       java.lang.String::charAt (33 bytes)
    159   4       java.lang.String::indexOf (151 bytes)
    174   5       java.lang.Object::&lt;init&gt; (1 bytes)
    182   6       java.util.LinkedList$Entry::&lt;init&gt; (20 bytes)
    183   7       java.util.LinkedList::add (12 bytes)
    185   8       java.util.LinkedList::addBefore (52 bytes)
    348   1 %     Test::main @ 25 (78 bytes)
Filler size 28082</pre>
<p>The last line tells us that Hotspot compiled the Test.main method with OSR (the <code>%</code>-flag) 348ms into the execution.</p>
<p>(Inspired by the Stackoverflow question <a href="http://stackoverflow.com/questions/8818424/are-weakhashmap-cleared-during-a-full-gc">Are WeakHashMap Cleared During A Full GC?</a> Thanks to berry120 and jalopaba for contributing detailed answers.)</p>
]]></content:encoded>
			<wfw:commentRss>http://jonisalonen.com/2012/can-your-interpreter-do-this/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
