Thursday, October 06, 2011

Volatile variables and their semantics in C/C++ vs Java

Most people think that volatile variables are for concurrency. In some sense, this intuition is correct. However, the semantics of volatile variables is different across different programming languages.

In C/C++, volatile variables are to prevent compilers from performing optimizations that may falsify the semantics of the code.

For example:

x=0;
while (x!=0) ;

In the above piece of code, an optimizing compiler can translate the program into


x=0;
while (true) ;

However, this optimization is dangerous because x might be changed by other threads. Making x "volatile" will prevent such an optimization.

(Ref http://en.wikipedia.org/wiki/Volatile_variable)

In contrast, volatile keywords in Java help maintain orderings (happen-before relationships) of accesses to variables. Basically, each java threads are supposed to maintain their own "local" copy of the "main" memory and they only update the "main" memory when needed for efficiency. Therefore, different threads may have different views about a variable. Therefore, in order to ensure consistency, volatile keyword can be used to force a flush every time a thread accesses a volatile variable. As a result, the value of the variable in the "main" memory is always up-to-date. This means that all threads accessing a volatile variable have a consistent view of its value.

There are main differences between volatile and synchronized keywords. A volatile only synchronizes on its own value. When a thread accesses a volatile variable, only its value is flushed to the main memory. However, a synchronized synchronizes values of all variables in the thread's local memory. When a thread access a synchronized region (either a method or a block of code), it flushes all variables in its local memory.Therefore, synchronized incurs more overhead than volatile.

(Ref http://www.javaperformancetuning.com/news/qotm030.shtml)











Labels: , , ,

Thursday, August 11, 2011

What are fixed points (fixpoints)?

I happened to find a very good explanation of a not-easy-to-understand term: fixed point.

http://deron.meranda.us/ruminations/fixpoints/

Check out to see whether it furthers your knowledge.

Labels: ,

Thursday, June 02, 2011

Tip and Rules for Parallel Programming

There are simple tips for parallel programming "Nine Tips to Parallel Programming Heaven". These rules include many aspects of parallel programing such as cost-efficiency trade-off, reusability, debugging, performance profiling, load balancing. Rule #7 is indeed pragmatic: "Stop when it is good enough". However, it is true because what we need is performance and therefore if we can achieve acceptable performance, why we need to spend more human resource on it.

"8 Simple Rules for Designing Threaded Applications" tells us 8 practical rules needed for multi-threaded programs. They are worth learning by heart.

Labels: , ,

Wednesday, June 01, 2011

SSH login without password

People using Linux or Unix would be very familiar with ssh. However, typing passwords everytime you access a remote machine is time-consuming and error-prone especially if you have a complex password. This is a very clear guide of how to get rid of the troublesomeness that we used to face ^_^

http://www.linuxproblem.org/art_9.html

Sunday, May 29, 2011

[Latex] Math formula deleted: Insufficient symbol fonts

This error with Latex is popular but the way to solve it is rather simple: install texlive-fonts-extra and the style will work.

[Ubuntu]$ sudo apt-get install texlive-fonts-extra

^_^

Algorithm Improvement via Performance Analysis: Parallel Merge

I am impressed by the way we design parallel algorithms incrementally via performance analysis. This article illustrates the idea by using Parallel Merge as an example:
http://drdobbs.com/high-performance-computing/229204454

To summarize,
"For parallel algorithms, divide-and-conquer proved to be a good strategy, especially coupled with the hybrid approach, enabling parallel processing with high performance. A fast sequential merge was shown to be a critical component of the fast parallel hybrid algorithm. "

It is a compromise between divide-and-conquer algorithm, memory access pattern, reducing task-switching overhead and memory bandwidth consumption. It is likely to say that big-O notation is no longer a good approximation for performance of parallel algorithms.

Tuesday, March 22, 2011

Problem with Peterson's mutual exclusion algorithm on multi-core

This is my Java program for mutual exclusion using Peterson's algorithm. This algorithm can works effectively on single-core machines but it turns wrong when running on multi-core systems. The reason is that multi-core processors re-order instructions for improving performance. In this example, mutual exclusion might be violated because the reordering between 2 instructions "in[id] = true;" and "turn = other();".

/* Prove: Peterson algorithm for mutual exclusion does not work for multi-processor
Precondition: 2 threads updating a shared variable "sum" in the critical region, sum = 0
Postcondition: sum = 2
However, sum can be 0 or 1 or 2.
*/

public class Peterson implements Runnable {

private static boolean[] in = { false, false };
private static volatile int turn = -1;
private static volatile int sum=0;
public static void main(String[] args) {
new Thread(new Peterson(0), "Thread - 0").start();
new Thread(new Peterson(1), "Thread - 1").start();
System.out.println(sum); //expected 2
}

private final int id;

public Peterson(int i) {
id = i;
}

private int other() {
return id == 0 ? 1 : 0;
}

@Override
public void run() {
for (int i=0;i < 1; i++)
{
in[id] = true;
turn = other();
while (in[other()] && turn == other()) {
//System.out.println("[" + id + "] - Waiting...");
}

sum++;

in[id] = false;
}
}
}

Below is my experimental result running on a 48-core machine.

Number of runs: 1 000 000
sum=0: 116556
sum=1: 882538
sum=2: 906

As we can see, our expected result (sum=2) only only holds in 906 runs (~0.1%).

This problem is known widely in the multi-core community. Mine is just a toy experiment to demonstrate the danger of applying sequential algorithms to solve parallel problems.

Labels: , , ,