Tuesday, December 27, 2011

Extend Thread vs implement Runnable

There are two ways to create your own thread type: subclass java.lang.Thread class, or implementing java.lang.Runnable and pass it to Thread constructor or java.util.concurrent.ThreadFactory. What is the difference, and which one is better?
  1. The practical reason is, a Java class can have only one superclass. So if your thread class extends java.lang.Thread, it cannot inherit from any other classes. This limits how you can reuse your application logic.
  2. From a design point of view, there should be a clean separation between how a task is identified and defined, between how it is executed. The former is the responsibility of a Runnalbe impl, and the latter is job of the Thread class.
  3. A Runnable instance can be passed to other libraries that accept task submission, e.g., java.util.concurrent.Executors. A Thread subclass inherits all the overhead of thread management and is hard to reuse.
  4. Their instances also have different lifecycle. Once a thread is started and completed its work, it's subject to garbage collection. An instance of Runnalbe task can be resubmitted or retried multiple times, though usually new tasks are instantiated for each submission to ease state management.

Friday, December 23, 2011

O(n) for operations on Java collections

Collection Interfaces - Java Collection Diagrams

And when we include maps:

Avoid memory leaks using Weak&Soft references

Some Java developers believe that there is no such a thing as memory leak in Java (thanks to the fabulous automatic Garbage Collection concept)

Some others had met the OutOfMemoryError and understood that the JVM has encountered some memory issue but they are not sure if it’s all about the code or maybe even an OS issue…

The OutOfMemoryError API docs reveals that it “Thrown when the Java Virtual Machine cannot allocate an object because it is out of memory, and no more memory could be made available by the garbage collector. “

As we know, the JVM has a parameter that represents the maximum heap size(-Xmx), hence we can defiantly try to increase the heap size. yet some code can generate new instances all the time, if those instances are accessible(being referenced by the main program – in a recursive manner) for the entire program life span, then the GC won’t reclaim those instances. hence the heap will keep increasing and eventually a OutOfMemoryError will be thrown <- we call that memory leak.

Our job as Java developers is to release references (that are accessible by the main program) that we won’t use in the future. by doing that we are making sure that the GC will reclaim those instances (free the memory that those instances occupying in the heap).

In some cases we reference an instance from 2 different roots. one root represent a fast-retrieval space(e.g. HashMap) and the other manages the real lifespan of that instance. Sometimes we would like to remove the reference of that instance from one root and get the other root(fast retrieval) reference removed automatically.

We wouldn’t want to do it manually due to the fact that we are not C++ developers and we wouldn’t like to manage the memory manually..

Weak references

In order to solve that we can use WeakReference.

Instances that are being referenced by only Weak references will get collected on the next collection! (Weakly reachable), in other words those references don’t protect their value from the garbage collector.

Hence if we would like to manage the life span of an instance by one reference only, we will use the WeakReference object to create all the other references.

( usage: WeakReference wr = new WeakReference(someObject);)

In some apps we would like to add all our existing references to some static list, those references should not be strong, otherwise we would have to clean those references manually, we would add those references to the list using this code.

public static void addWeakReference(Object o){
refList.add(new WeakReference(o));

since most of the WeakReferences use cases needs a Map data structure, there is an implementation of Map that add a WeakReference automatically for you – WeakHashMap

Soft References

I saw few implementations of Cache using weak references (e.g. the cache is just a WeakHashMap => the GC is cleaning old objects in the cahce), without WeakReferences naive cache can easily cause memory leaks and therefor weak references might be a solution for that.

The main problem is that the GC will clean the cached-object probably and most-likely faster then you need.

Soft references solve that, those references are exactly like weak references, yet the GC won’t claim them as fast. we can be sure that the JVM won’t throw an OutOfMemory before it will claim all the soft and weak references!

using a soft references in order to cache considered the naive generic cache solution. (poor’s men cache)

( usage:SoftReference sr = new SoftReference(someObject);)

Java Forever - Official Video

Thursday, December 22, 2011

One java - 2 compilers (Javac and JIT)

It seems like that for students there is a lot of confusion regarding how Java/The JVM works because there are TWO compilers involve, so when someone mentions a compiler or the Just In Time compiler some of them would imagine it’s the same one, the Java Compiler..

So how does it really works?

It’s simple..

1) You write Java code (file.java) which compiles to “bytecode“, this is done using the javacthe 1st compiler.

It’s well known fact that Java can be written once get compiled and run anywhere (on any platform) which mean that different types of JVM can get installed over any type of platform and read the same good old byte code

2) Upon execution of a Java program (the class file or Jar file that consists of some classes and other resources) the JVM should somehow execute the program and somehow translate it to the specific platform machine code.

In the first versions of Java, the JVM was a “stupid” interprater that executes byte-code line by line….that was extremely slow…people got mad, there were a lot of “lame-java, awesome c” talks…and the JVM guys got irratated and reinvented the JVM.

the “new” JVM initially was available as an add-on for Java 1.2 later it became the default Sun JVM (1.3+).

So what did they do? they added a second compiler.. Just In Time compiler(aka JIT)..

Instead of interpreting line by line, the JIT compiler compiles the byte-code to machine-code right after the execution..

Moreover, the JVM is getting smarter upon every release, it “knows” when it should interpat the code line-by-line and what parts of the code should get compiled beforehand (still on runtime).

It does that by taking real-usage statistics, and a long-list of super-awesome heuristics..

The JVM can get configured by the user in order to disable/enable some of those heuristics..

To summarize, In order to execute java code, you use two different compilers, the first one(javac) is generic and compiles java to bytecode, the second(jit) is platform-dependent and compiles some portions of the bytecode to machine-code in runtime!

Optimizing and Speeding up the code in Java

Finalizers: object that overwrites the finalize() method (“Called by the garbage collector on an object when garbage collection determines that there are no more references to the object.”) is slower!(for both allocation and collection) if it’s not a must, do clean ups in other ways ( e.g. in JDBC close the connection using the try-catch-finally block instead)

New is expensive: creating new heavy object() on the heap is expensive!, it’s recommended to recycle old objects (by changing their fields) or use the flyweight design pattern.

Strings : (1) Strings are immutable which mean that upon usage of the + operator between 2 strings the JVM will generate a new String(s1+s2) on the heap (expensive as I just mentioned), in order to avoid that, it’s recommended to use the StringBuffer.(Update) since JDK 1.5 was introduced  StringBuilder is a better option than Stringbuffer in a single-threaded environment.

(2) Don’t convert your strings to lower case in order to compare them, use String.equalIgnoreCase() instead.

(3) String.startswith() is more expensive than String.charat(0) for the first character.

Inline method: inline is a compiler feature, when you call a method from anywhere in your code, the compiler copies the content of the inline method and replace the line that calls the method with it.

Obviously,It saves runtime time: (1) there is no need to call a method (2) no dynamic dispatch.

In some languages you can annotate a method to be inline, yet in Java it’s impossible, it’s the compiler decision.

the compiler will consider inline a method if the method is private.

My recommendation is to search in your code for methods that are heavily used(mostly in loops) and annotate those method as private if possible.

Don’t invent the wheel: the java api is smart and sophisticated and in some cases use native implementation, code that you probably can’t compete with. unless you know what you are doing (performance wise) don’t rewrite methods that already exists in the java API. e.g. benchmarks showed that coping and array using a for loop is at least n/4 times slower than using System.arraycopy()

Reflection: reflection became much faster for those of you who use the most recent JVMs, yet using reflection is most certainly slower than not using it. which mean that you better avoid reflection if there is no need.

Synchronization: Some data structures auto-support concurrency, in case of a single thread application don’t use those to avoid overhead e.g. use ArrayList instead of a Vector

Multithreads: in case of a multi processor use threads, it will defiantly speed up your code, if you are not a thread expert some compilers know how to restructure your code to thread automatically for you. you can always read a java threads tutorial as well

Get familiar with the standard data structures:   e.g. if you need a dast for puting and retriving objects use HashMap and not an ArrayList. (O(1))

Add an id field, for faster equals(): object that contains many fields are hard to compare ( equals() wise), to avoid that add an id(unique) field to your object and overwrite the equals() method to compare ids only.

Be careful, In case  your code already works, optimizing it is a sure way to create new bugs and make your code less maintainable!

it’s highly recommended to time your method before and after an optimization.