Imagine that you have a flat file in csv format. And it has a 100 million rows from which you are about to read data to store and process in your app.
The data is in the format (orderId, storeIdentifier, amountDue).
What optimization can you do here?
Note that the storeIdentifier is going to be repeated a lot. Every time you read a record and split it into a string and possibly store it into an in-memory data structure, you will be creating a new String object. So 100 million String objects will be created for the storeIdentifier. But you know that there are only (say) 100 stores in all! So there is a massive amount of wasted memory.
What you can do here is – Right after you have read the storeIdentifier string, do this -
storeIdentifier = storeIdentifier.intern();
That would put the store identifier into the String pool and keep the number of String instances with the same data minimal by returning the String from the pool once it has been put into it by the first invocation for the string.
Points to Note
- Use intern() only if you really need to use it. And only if you know the extra instances are going to be a problem. And only if you really understand how it works.
- Older JVMs had a problem collecting interned strings. Newer JVMs handle this fine. Don’t worry about leaks due to a growing pool. If other references are gone, interned strings will be collected by the GC.
- Interned strings go into the PermGen Space area of memory in some JVMs. This is not part of the normal heap. If you send too many strings here, an OutOfMemoryError will hit you even though your heap may have several GB available.
- Interned strings can be compared with == rather than .equals(). This is a bit faster. But it is rarely worth the brittle code.
- Calling String.intern() can be a performance hit. It takes CPU cycles to maintain the pool and do the comparisons. Are you sure you are saving enough memory to make it worth the CPU? Measure. Don’t guess.
- Use String.intern() only if the set of possible Strings that will be interned has a bound tight enough such that the set of different strings is much smaller than the total number of strings that will be read.
No comments:
Post a Comment