LRU Cache implementation

This one is a very known algorithm that is often taken as example for students but it’s an interesting one.

If someone ask you to implement a LRU (Least Recently Used) Cache. How would you do it?
Since it’s a cache you need to guarantee a O(1) for reading and writing. For a fast access, hash tables are very often the right data structure so we can consider it, but we need to keep the order and hash tables cannot do that.
An LRU cache is also a FIFO (First In First Out) data structure, a queue looks very adapted too but we loose the O(1) access time.

A good approach is to use both:

  • An hash table for the O(1) lookup time
  • A queue to keep the order

The only one problem is that queues are very effective for enqueue and dequeue but very slow for random access, and since each hit has to reorder the queue, those operations would lead to a O(n) lookup time for rearranging the queue every time we access the cache.

The good strategy is to keep this approach but use a double linked list instead of a queue because:

  • Very easy to implement a queue with it
  • Still slow for random access but easy to move a given node to the head and it’s all we need

Let’s focus on the implementation. We need first to define a node that contains the pair key/value:

class Node {
    int key;
    int data;
    Node previous;
    Node next;
}

Your LRUCache definition:

class LRUCache {

    private int capacity;
    private Map<Integer, Node> data;
    private Node head;
    private Node end;

    public LRUCache(int capacity) {
        this.capacity = capacity;
        this.data = new HashMap<>();
    }
}

Then we write an add method that append the node as the head:

    private void add(Node node) {

        // Reset position
        node.next = null;
        node.previous = null;

        // First element
        if (null == this.head) {
            this.head = node;
            this.end = node;
            return;
        }

        // Existing element
        this.head.previous = node;
        node.next = this.head;
        this.head = node;
    }

Same thing for the remove:

    private void remove(Node node) {

        // Nothing to do
        if (this.head == null || null == node) {
            return;
        }

        // The only one item
        if (this.head == this.end && this.head == node) {
            this.head = null;
            this.end = null;
            return;
        }

        // Remove from head
        if (node == this.head) {
            this.head.next.previous = null;
            this.head = this.head.next;
            return;
        }

        // Remove from end
        if (node == this.end) {
            this.end.previous.next = null;
            this.end = this.end.previous;
            return;
        }

        // Remove in the middle
        node.previous.next = node.next;
        node.next.previous = node.previous;

    }

Two methods that help to move a node to the head (for hits), and a remove last that cleanup the oldest item:

    private void moveFirst(Node node) {
        this.remove(node);
        this.add(node);
    }

    private void removeLast() {
        this.remove(this.end);
    }

The linked list is partially implemented (enough for our needs).
The LRU get method simply retrieve the key and move in to the head in the list.

public int get(int key) {

        // Existing key
        if (this.data.containsKey(key)) {

            // Move to first place
            Node node = this.data.get(key);
            this.moveFirst(node);

            // Return the value
            return node.data;
        }

        // Not found
        return -1;
    }

Last method, the set method generates an hit to move the accessed element to the head.
Like the get method, the set method deals with both hash table and linked list:

    public void set(int key, int value) {

        // Existing slot
        if (this.data.containsKey(key)) {
            Node node = this.data.get(key);
            this.moveFirst(node);
            node.data = value;
            return;
        }

        // Out of capacity, cleaning the oldest slot
        if (this.data.size() >= this.capacity) {
            int id = this.end.key;
            this.removeLast();
            this.data.remove(id);
        }

        // New slot
        Node node = new Node();
        node.key = key;
        node.data = value;
        this.add(node);
        this.data.put(key, node);
    }

Of course you probably shouldn’t implement your own cache or linked list, there are so many very efficient implementations.
In Java you can implement your LRUCache in a couple of instructions reusing java built in data structures but here we just want to understand what’s going on and write our own for learning purpose.
Notice also that this implementation is not thread safe too.

Linear sorting algorithm (bitmap sort)

Most of the time, sorting algorithms cannot have a linear execution time because of the nature of the data. For generics algorithms the best complexity is O(nlog(n)) which is not bad compared to the very trivial ones that are O(n2).

Depending on the size of your data and their nature, it may be appropriate to use more efficient algorithms and most of the time the implementation itself is tightly dependent of your needs. However it’s often the same approach that you adapt for your needs.

The easiest data to sort are numbers because their nature allows us to use intermediate data structures like arrays and/or simple binary representation to process the sorting in O(n).

Let’s say you have 1 billion integers with no dupliates and you need to sort them.
Using a generic algorithm is not acceptable because of the amount of comparison and the memory footprint would be 4 billions bytes (about 3.7Gb).

If we step back and assume that integers are between 0 and 1,000,000,000, since we know all possible numbers and the final sorted output will be an iteration from 0 to 1,000,000,000 with some missing. We only need to know whether a number is present or not.

The smallest data structure to store existence is a bit (0 or 1) that flags a given number as existing or not.
We need a very long string of bit to store 1,000,000,000 state. It’s doable by using chunks of 32 (integer) and we need 1,000,000,000 / 32 = 31250000 integers thus 119.2Mb that is way better.

Our data structure will be a simple int[31250000]. All we need is for a given read integer:

  • Find the bucket: number / 32
  • Find the bit: number % 32
  • Set the bit to 1

Fairly easy implementation:

int[] bitmap = new int[31250000];
Scanner in = new Scanner(new FileReader("data.txt"));
while(in.hasNextLine()) {
    int i = Integer.parseInt(in.nextLine());
    bitmap[i / 32] |= 1 << i % 32;
}

All we need to do is scanning the bitmap and print the number if the bit is 1:

for(int bucket = 0; bucket < bitmap.length; ++bucket) {
    int bit = 1;
    while(bitmap[bucket] != 0) {
        if ((bitmap[bucket] & 1) == 1) {
            System.out.println((32 * bucket) + bit - 1);
        }
    bitmap[bucket] >>>= 1;
    ++bit;
    }
}

Let’s say we want to print only the missing number, we can keep almost everything as it:

for(int bucket = 0; bucket < bitmap.length; ++bucket) {
    int bit = 1;
    while(bitmap[bucket] != 0) {
        if ((bitmap[bucket] & 1) == 0) { // only look for not existing this time.
            System.out.println((32 * bucket) + bit - 1);
        }
    bitmap[bucket] >>>= 1;
    ++bit;
    }
}

Now we still consider 1 billion integers but from a range [1,000,000 to 1,999,999], we have 1,000,000 possible number. The bitmap would be 1,000,000 / 32 = 31250 (122Kb). That means yes we can sort 4 billions integers with about 100 kb memory in a linear time. Notice that any duplicates will be removed that might be a problem. In this case count sort might help.

Of course this cannot resolve everything and it’s effective because the numbers are well known and the missing ones are few comparing to the existing ones.
If you don’t know the range or if most of them are missing in your input, the bitmap will be mostly empty and it’s memory wasting and you need another algorithm like radix sort that have a complexity of O(kn), k being the digit numbers).

If the bitmap is too huge to fit the memory, we can keep everything as it except that we’ll apply a merge sort with a bitmap sort. We would create intermediate files (output of the bitmap sort) and then merge them to the final output.

Maven classpath isolation

I remember some issues I had with maven about classpath due to classpath isolation.
For exemple if you try to use stax in unit test, you could be disturbed because maven already use it.

Maven 3.x should fix this kind of problems but it doesn’t look like changes effective at compile time yet.
You still can get this issue using velocity 1.7 with an annotation processor and run this processor through unit test compiling.

The output is :

Caused by: org.apache.velocity.exception.VelocityException: The specified logger class org.apache.velocity.runtime.log.CommonsLogLogChute does not implement the org.apache.velocity.runtime.log.LogChute interface.
	at org.apache.velocity.runtime.log.LogManager.createLogChute(LogManager.java:181)
	... 46 more

Got from maven 3.0.3 compiling :

┌─[defrancea@~/eXo/wikbook-annotations/template]
└─>mvn --version
Listening for transport dt_socket at address: 5005
Apache Maven 3.0.3 (r1075438; 2011-03-01 00:31:09+0700)
Maven home: /usr/share/maven
Java version: 1.6.0_26, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x", version: "10.6.8", arch: "x86_64", family: "mac"

Actually maven 3.0.3 use velocity 1.5 and some refactoring was done since this version on the API.
You can either use velocity 1.5 or use another template engine. My choice was to use freemarker (sorry velocity guys).

Groovy and me, a long love story.

For many days I was trying to adapt some Plain old java test (POJT ?) on my groovy portage of Chromattic Framework (Object mapper to JCR).
This adaptation use the MOP, it’s a cool way to have a good chromattic integration.
But after quick test based on GroovyClassLoader, I have to port all the existing tests to my adaptation, but nothing work !
After several long hours of work in my code and original code, I have just found the real problem : Groovy !
When we say “groovy is dynamic, MOP is beautiful, MOP is cool” it’s not groovy but groovy in specific context.
Groovy called method have the same mechanism of method resolution of Java …
MOP feature seem available only if the call are made by groovy code …

Check this example :

class A {
 public Integer m() { return 3 }
 def invokeMethod(String m, Object p) { return 42 }
}

and Java unit test :

public class GoofyTestCase extends TestCase {
 public void testGroofy() throws Exception {
   assertEquals(42, new GroovyShell().evaluate("import org.chromattic.groovy.metamodel.B; new B().m()")); // true
   assertEquals((Object) 3, new B().m()); // true
 }
}

MOP is used only when the call are made in groovy script.
Do the Dynamic resolution be made by groovy or the groovy shell ?

Groovy setter generation

Today, I have to generate setter to annote it at the compilation time.

I’ve previously generate getter as following :

classNode.addMethod(
      GroovyUtils.getsetName(GroovyUtils.GetSet.GET, fieldNode.getName())
      , Modifier.PUBLIC
      , fieldNode.getType()
      , new Parameter[]{}
      , new ClassNode[]{}
      , new ReturnStatement(new FieldExpression(fieldNode))
    );

But If I try to generate setter :

    classNode.addMethod(
      GroovyUtils.getsetName(GroovyUtils.GetSet.SET, fieldNode.getName())
      , Modifier.PUBLIC
      , new ClassNode (Void.TYPE)
      , new Parameter[]{ new Parameter(fieldNode.getType(), "value") }
      , new ClassNode[]{}
      , new ExpressionStatement(new BinaryExpression(new PropertyExpression(new VariableExpression("this"), fieldNode.getName()), Token.newSymbol(Types.EQUAL, 0, 0), new VariableExpression("value")))
    );

and … nothing, no setter generated …

It’s time to check groovy sources code and I had found :

    public MethodNode getSetterMethod(String setterName) {
        for (Object o : getDeclaredMethods(setterName)) {
            MethodNode method = (MethodNode) o;
            if (setterName.equals(method.getName())
                    && ClassHelper.VOID_TYPE==method.getReturnType()
                    && method.getParameters().length == 1) {
                return method;
            }
        }
        ClassNode parent = getSuperClass();
        if (parent!=null) return parent.getSetterMethod(setterName);
        return null;
    }
    VOID_TYPE = new ClassNode(Void.TYPE)

… Groovy developers use a new instance of ClassNode and use == operator. they compare memory address (the instance) but not the value.

Just changing “new ClassNode (Void.TYPE)” by “ClassHelper.VOID_TYPE” resolve my problem.

ClosureExpression with Groovy AST 1.6.5

Today I have to create closure with groovy AST, but when I use ClosureExpression with this snippet :

ClosureExpression closureExpression = new ClosureExpression (
  new Parameter[] {}
  , new ExpressionStatement(new PropertyExpression(new VariableExpression("it"), "class"))
);

I get a very explicit error :

java.lang.NullPointerException
at org.codehaus.groovy.classgen.AsmClassGenerator.createClosureClass(AsmClassGenerator.java:3601)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitClosureExpression(AsmClassGenerator.java:1559)
at org.codehaus.groovy.ast.expr.ClosureExpression.visit(ClosureExpression.java:46)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitAndAutoboxBoolean(AsmClassGenerator.java:4037)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeCallSite(AsmClassGenerator.java:1965)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeCall(AsmClassGenerator.java:1799)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeCall(AsmClassGenerator.java:1785)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeInvokeMethodCall(AsmClassGenerator.java:1768)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitMethodCallExpression(AsmClassGenerator.java:2277)
at org.codehaus.groovy.ast.expr.MethodCallExpression.visit(MethodCallExpression.java:63)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitAndAutoboxBoolean(AsmClassGenerator.java:4037)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitCastExpression(AsmClassGenerator.java:1711)
at org.codehaus.groovy.ast.expr.CastExpression.visit(CastExpression.java:66)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitAndAutoboxBoolean(AsmClassGenerator.java:4037)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeCallSite(AsmClassGenerator.java:1965)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeCall(AsmClassGenerator.java:1799)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeCall(AsmClassGenerator.java:1785)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeInvokeMethodCall(AsmClassGenerator.java:1768)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitMethodCallExpression(AsmClassGenerator.java:2277)
at org.codehaus.groovy.ast.expr.MethodCallExpression.visit(MethodCallExpression.java:63)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitAndAutoboxBoolean(AsmClassGenerator.java:4037)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeCallSite(AsmClassGenerator.java:1941)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeCall(AsmClassGenerator.java:1799)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeCall(AsmClassGenerator.java:1785)
at org.codehaus.groovy.classgen.AsmClassGenerator.makeInvokeMethodCall(AsmClassGenerator.java:1768)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitMethodCallExpression(AsmClassGenerator.java:2277)
at org.codehaus.groovy.ast.expr.MethodCallExpression.visit(MethodCallExpression.java:63)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitAndAutoboxBoolean(AsmClassGenerator.java:4037)
at org.codehaus.groovy.classgen.AsmClassGenerator.evaluateExpression(AsmClassGenerator.java:1296)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitReturnStatement(AsmClassGenerator.java:1257)
at org.codehaus.groovy.ast.stmt.ReturnStatement.visit(ReturnStatement.java:47)
at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClassCodeContainer(ClassCodeVisitorSupport.java:73)
at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitConstructorOrMethod(ClassCodeVisitorSupport.java:80)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitStdMethod(AsmClassGenerator.java:552)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitConstructorOrMethod(AsmClassGenerator.java:528)
at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitMethod(ClassCodeVisitorSupport.java:88)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitMethod(AsmClassGenerator.java:632)
at org.codehaus.groovy.ast.ClassNode.visitContents(ClassNode.java:1055)
at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClass(ClassCodeVisitorSupport.java:48)
at org.codehaus.groovy.classgen.AsmClassGenerator.visitClass(AsmClassGenerator.java:242)
at org.codehaus.groovy.control.CompilationUnit$10.call(CompilationUnit.java:718)
at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:925)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:462)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:278)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:249)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:244)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:206)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:216)
at org.chromattic.groovy.core.PlopTestCase.testPlop(PlopTestCase.java:35)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at com.intellij.junit3.JUnit3IdeaTestRunner.doRun(JUnit3IdeaTestRunner.java:108)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64)

Thanks to IntelliJ idea I can set a breakpoint on NullPointerException and I come in the AsmClassGenerator.java file at the line 3731 from groovy 1.6.5 sources and I can see :

VariableScope scope = ce.getVariableScope();
Parameter[] ret = new Parameter[scope.getReferencedLocalVariablesCount()];

Why developers don’t have set the scope in ClosureExpression’s constructor of control the value of the scope to avoid the NullPointerException ?

Of course the simple following statement resolve the problem :

closure.setVariableScope(new VariableScope());

If you want to use groovy (even from java code), you must be very optimistic.

Unicode source code

One day I have read that java source code is considered as unicode text.

Today I want to try that :

$ echo "\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u006C\u0061\u0073\u0073\u0020\u0041\u0020\u007B\u000A\u0020\u0020\u0020\u0020\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u0020\u0076\u006F\u0069\u0064\u0020\u006D\u0061\u0069\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u0067\u005B\u005D\u0020\u0061\u0072\u0067\u0076\u0029\u0020\u007B\u000A\u0020\u0020\u0020\u0020\u0020\u0020\u0020\u0020\u0053\u0079\u0073\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E\u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u0022\u0070\u006F\u0075\u0065\u0074\u0022\u0029\u003B\u000A\u0020\u0020\u0020\u0020\u007D\u000A\u007D" > A.java
$ javac A.java
$ java A
pouet

The unicode string is equivalents to the following code :

public class A {
    public static void main(String[] argv) {
        System.out.println("pouet");
    }
}