GIMP and multiple processors (2005) — GIMP Development

Toggle history

Please log in to manage your subscriptions.
GIMP and multiple processors	Sven Neumann	12 Feb 14:09
GIMP and multiple processors	Sven Neumann	13 Feb 22:10
GIMP and multiple processors	Daniel Egger	13 Feb 22:18
GIMP and multiple processors	Sven Neumann	13 Feb 22:43
GIMP and multiple processors	Daniel Egger	13 Feb 23:06
GIMP and multiple processors	Nathan Summers	15 Feb 22:29
GIMP and multiple processors	Nathan Summers	15 Feb 22:32
GIMP and multiple processors	Sven Neumann	16 Feb 01:57
GIMP and multiple processors	Sven Neumann	16 Feb 22:42
GIMP and multiple processors	Daniel Egger	18 Feb 11:51
GIMP and multiple processors	Sven Neumann	20 Feb 14:09
GIMP and multiple processors	Daniel Egger	20 Feb 17:17
GIMP and multiple processors	Kevin Cozens	20 Feb 17:51
GIMP and multiple processors	Daniel Egger	20 Feb 21:43
GIMP and multiple processors	Sven Neumann	20 Feb 21:55
GIMP and multiple processors	Daniel Egger	20 Feb 22:27
GIMP and multiple processors	Sven Neumann	20 Feb 22:55
GIMP and multiple processors	Marc) (A.) (Lehmann	20 Feb 23:47
GIMP and multiple processors	Daniel Egger	21 Feb 00:12
GIMP and multiple processors	Adam D. Moss	21 Feb 00:52
GIMP and multiple processors	Tino Schwarze	21 Feb 09:14
GIMP and multiple processors	David Bonnell	21 Feb 12:28
GIMP and multiple processors	Marc) (A.) (Lehmann	21 Feb 13:45
GIMP and multiple processors	David Bonnell	22 Feb 00:18
GIMP and multiple processors	Jay Cox	21 Feb 19:07
GIMP and multiple processors	Daniel Egger	21 Feb 21:34
GIMP and multiple processors	Jay Cox	22 Feb 07:52
GIMP and multiple processors	David Bonnell	21 Feb 01:10
GIMP and multiple processors	Sven Neumann	21 Feb 02:00
GIMP and multiple processors	David Bonnell	21 Feb 03:47
GIMP and multiple processors	Sven Neumann	22 Feb 21:27
GIMP and multiple processors	John Cupitt	21 Feb 14:02
GIMP and multiple processors	Sven Neumann	22 Feb 21:29
GIMP and multiple processors	Marc) (A.) (Lehmann	21 Feb 03:14
GIMP and multiple processors	Daniel Egger	21 Feb 09:43
GIMP and multiple processors	Daniel Egger	20 Feb 23:50
GIMP and multiple processors	Jay Cox	26 Feb 02:44
GIMP and multiple processors	Sven Neumann	26 Feb 12:10
GIMP and multiple processors	Daniel Egger	26 Feb 19:07
GIMP and multiple processors	Sven Neumann	27 Feb 01:04
GIMP and multiple processors	Daniel Egger	27 Feb 14:17
GIMP and multiple processors	Sven Neumann	27 Feb 14:31
GIMP and multiple processors	Daniel Egger	27 Feb 14:59
GIMP and multiple processors	Sven Neumann	27 Feb 15:19
GIMP and multiple processors	Daniel Egger	27 Feb 15:41
GIMP and multiple processors	Sven Neumann	27 Feb 17:24
GIMP and multiple processors	Daniel Egger	27 Feb 17:32
GIMP and multiple processors	Daniel Egger	27 Feb 15:53
GIMP and multiple processors	Jay Cox	27 Feb 23:28
GIMP and multiple processors	Sven Neumann	28 Feb 00:25
GIMP and multiple processors	Daniel Egger	28 Feb 02:11
GIMP and multiple processors	GSR - FR	28 Feb 16:01
GIMP and multiple processors	Sven Neumann	28 Feb 21:15
GIMP and multiple processors	Daniel Egger	28 Feb 21:33
GIMP and multiple processors	Adam D. Moss	01 Mar 01:44
GIMP and multiple processors	Jay Cox	28 Feb 09:34
GIMP and multiple processors	Sven Neumann	01 Mar 00:35

GIMP and multiple processors

Hi,

since hyperthreading and multi-core CPUs are becoming more and more common, I think we should put a little more focus on making use of these features. For that reason, I have made --enable-mp the default in CVS HEAD and also changed the gimprc preference so that GIMP uses 2 processors. This default doesn't have necessarily have to make it into the next stable release. The main concern now should be to make sure that it works at all.

I had a look at the current implementation and also cleaned up the code a little. We are talking about app/base/pixel-processor.[ch]. There seems to be room for improvement. I'd like to hack on this myself but my time is limited, so I thought I'd ask if someone else would like to take this job. So here are some ideas of what could be changed:

- Port the thread code to g_thread functions to make it more portable.

That would probably be necessary since we don't want to maintain our own thread abstraction layer. It would mean that we would have to call g_thread_init(). I am not sure if that would introduce any noticeable overhead on systems that only use a single CPU. Might be worth to investigate that.

- Use GThreadPool instead of continuously starting new threads.

Creating a thread has a considerable overhead. The current implementation continuously creates new threads that finish after they have done their job. GLib provides a GThreadPool that would probably fit nicely here.

As you can see, this is a rather small task coding-wise. It would however be good to also do some benchmarking to find out how much overhead is introduced by using the threaded pixel-processor. We don't want to make things worse for single-CPU systems.

Any volunteers?

Sven

GIMP and multiple processors

Hi,

a quick followup to myself...

I couldn't resist and spent some time on the threaded pixel-processor code . The first part of the TODO I posted yesterday is done, the code has been ported to gthread. This makes the thread functionality available on all platforms supported by gthread-2.0.

I have also cleaned up the code further while porting it to gthread. It seems to be working rather well now, but I haven't done any benchmarking so far. There's now a #define to control the ratio between created threads and tiles that are to be processed. This parameter definitely needs tuning.

Also tried to port the code to GThreadPool but it turned out to be not as trivial as I expected. The current code blocks until all threads are returned and it is not trivial to implement this behaviour with GThreadPool. So this part, the more interesting part of the TODO, is still open.

I don't want to put more time into this, but I think it would definitely make sense to introduce a thread pool to cut down the number of thread creations. Any volunteers?

Sven

GIMP and multiple processors

On 13.02.2005, at 22:10, Sven Neumann wrote:

I couldn't resist and spent some time on the threaded pixel-processor code . The first part of the TODO I posted yesterday is done, the code has been ported to gthread. This makes the thread functionality available on all platforms supported by gthread-2.0.

If you need someone to run benchmark tests, let me know. I've a dual-Opteron under my desk, a dual-G4 and a dual-G5 at my disposal, a HT P4 machine in my near, and can easily fetch a dual-SPARC workstation oder some bigger irons if needed.

Servus,
Daniel

GIMP and multiple processors

Hi,

Daniel Egger writes:

On 13.02.2005, at 22:10, Sven Neumann wrote:
I couldn't resist and spent some time on the threaded pixel-processor code . The first part of the TODO I posted yesterday is done, the code has been ported to gthread. This makes the thread functionality available on all platforms supported by gthread-2.0.
If you need someone to run benchmark tests, let me know. I've a dual-Opteron under my desk, a dual-G4 and a dual-G5 at my disposal, a HT P4 machine in my near, and can easily fetch a dual-SPARC workstation oder some bigger irons if needed.

Doing benchmarks is exactly what would have to be done at this point. The main problem is probably what to use as a benchmark. One could write a script-fu and run that using batch-mode.

It would be interesting to have numbers that show the effect of the "num-processors" gimprc option and it would be interesting to see how the value of the TILES_PER_THREAD #define influences the performance. So far I can only guess that the cost of thread creation is perhaps a problem.

Sven

GIMP and multiple processors

On 13.02.2005, at 22:43, Sven Neumann wrote:

Doing benchmarks is exactly what would have to be done at this point. The main problem is probably what to use as a benchmark. One could write a script-fu and run that using batch-mode.

I've been using the coffee stain effect fu in the past but this is *way* to random to deliver sensible results for timing tests; it is somewhat useful for profiling though which is not what we would want to do in this case, though.

It would be interesting to have numbers that show the effect of the "num-processors" gimprc option and it would be interesting to see how the value of the TILES_PER_THREAD #define influences the performance. So far I can only guess that the cost of thread creation is perhaps a problem.

If anyone can deliver some good benchmark I'd be happy to run it on a few machines with the CVS version....

Servus, Daniel

GIMP and multiple processors

On Sun, 13 Feb 2005 22:10:21 +0100, Sven Neumann wrote:

Also tried to port the code to GThreadPool but it turned out to be not as trivial as I expected. The current code blocks until all threads are returned and it is not trivial to implement this behaviour with GThreadPool. So this part, the more interesting part of the TODO, is still open.
I don't want to put more time into this, but I think it would definitely make sense to introduce a thread pool to cut down the number of thread creations. Any volunteers?

Here's a solution that would work: (note, untested code, error checking and unintersting stuff omitted for clarity, etc)

Calling code: struct SynchStuff {
GThreadPool *pool;
GCond *cond;
GMutex *mutex;
} synch;

typedef struct SynchStuff SynchStuff;

synch.cond = g_cond_new(); synch.mutex = g_mutex_new();

/* create an exclusive threadpool */ synch.pool = g_thread_pool_new(gimp_thread_func, &synch, ...);

/* make sure that all the tasks are in the queue before checking if queue is empty.
*/

g_mutex_lock(synch.mutex);

for (/* each task */ ...) { g_thread_pool_push(sych.pool, task, ....); }

/* alll threads are now launched. allow threads to test queue for emptiness * and wait for signal that all tasks are complete */

g_cond_wait (synch.cond, synch.mutex);

/* all tasks are now complete */

g_mutex_unlock(synch.mutex);

void gimp_thread_func(GimpTaskData *task, SynchStuff *synch) {

/* process task */ .
.
.

/* wait until all tasks have been queued */ g_mutex_lock(synch->mutex);

if (g_thread_pool_get_num_threads(synch->pool) == 1) g_cond_signal(synch->cond, synch->mutex);

g_mutex_unlock(mutex); }

We could also try to talk the glib developers into including g_thread_pool_join, which would block until the task pool is empty. It would be nice if the GThreadPool API were modelled on OpenMP to the extent possible, which I admit is not really stellar because of the compiler modifications OpenMP requires. But perhaps some of what OpenMP does with compiler modifications we can do with macros.

Rockwalrus

GIMP and multiple processors

On Tue, 15 Feb 2005 16:29:00 -0500, Nathan Summers wrote:

if (g_thread_pool_get_num_threads(synch->pool) == 1)

correction: I meant if (g_thread_pool_get_num_threads(synch->pool) == 1 && g_thread_pool_unprocessed(synch->pool) == 0)

although the first test is sufficient if n(tasks) >= n(threads)

Rockwalrus

GIMP and multiple processors

Hi,

Nathan Summers writes:

Here's a solution that would work: (note, untested code, error checking and unintersting stuff omitted for clarity, etc)

Yes, that would most probably work.

struct SynchStuff {
GThreadPool *pool;
GCond *cond;
GMutex *mutex;
} synch;
typedef struct SynchStuff SynchStuff;

I don't think we need a SynchStuff struct though. These can very well be global static variables. The thread pool will be kept around for the whole lifetime of gimp anway.

Sven

GIMP and multiple processors

Hi,

I couldn't resist and changed the PixelProcessor to use a thread pool. Main motivation was to make progress callback work for the threaded case. So there's now a variant of pixel_regions_process_parallel() that takes progress function and data. This allows us to parallelize some of the slower core functions (like for example gradient blend).

It would be interesting to know if this actually gives a noticeable speedup on SMP systems. Would be nice if one of you could give this some testing. Please try to do gradient blends (w/o adaptive supersampling!) on large images. Changing the number of processors in the preferences dialog allows you to switch between using the thread pool and the single-threaded code.

Sven

GIMP and multiple processors

On 16.02.2005, at 22:42, Sven Neumann wrote:

It would be interesting to know if this actually gives a noticeable speedup on SMP systems. Would be nice if one of you could give this some testing. Please try to do gradient blends (w/o adaptive supersampling!) on large images. Changing the number of processors in the preferences dialog allows you to switch between using the thread pool and the single-threaded code.

Hm, there's still no idea floating around how to benchmark. I'd rather have some microbenchmark that delivers reproducible results than only being able to say "hey, it feels faster"...

It think this writing some GIMPbench would be a nice volunteer job and hey, if you do it, you'll get to choose the benchmark methods and all the fame... ;)

Anyone?

Servus, Daniel

GIMP and multiple processors

Hi,

Daniel Egger writes:

Hm, there's still no idea floating around how to benchmark.

There are very clear ideas on how to do it. Someone just needs to sit down and write the script (or dig out script-fu-bench.scm which is what we used to use a long time ago).

I'd rather have some microbenchmark that delivers reproducible results than only being able to say "hey, it feels faster"...

You'd still do me a favor if you would try current CVS and told me whether it feels faster or not.

Sven

GIMP and multiple processors

On 20.02.2005, at 14:09, Sven Neumann wrote:

There are very clear ideas on how to do it.

Hm, must have missed that...

Someone just needs to sit down and write the script (or dig out script-fu-bench.scm which is what we used to use a long time ago).

You'd still do me a favor if you would try current CVS and told me whether it feels faster or not.

Okee, I'm warming up the machine right now. This'll take a while though because I need to recompile quite some stuff. I'll do that in 32bit mode if you don't mind...

Servus, Daniel

GIMP and multiple processors

Sven Neumann wrote:

Daniel Egger writes:
Hm, there's still no idea floating around how to benchmark.
There are very clear ideas on how to do it. Someone just needs to sit down and write the script (or dig out script-fu-bench.scm which is what we used to use a long time ago).

It would be nice to have a copy of that old script-fu-bench.scm script. I used to have a copy with one of my versions of GIMP. It isn't in CVS and I can't find it on the net. If someone has the script I'd like to get a copy of it.

Even if we had the old script-fu-bench script around I don't think its the best choice to do the benchmarking tests which are currently needed. It is something I (or someone knowledgeable in Script-Fu or another GIMP scripting language) could write. It would be helpful to have a list of GIMP features/functions to use which would make call the code which needs to be benchmarked.

GIMP and multiple processors

On 20.02.2005, at 14:09, Sven Neumann wrote:

You'd still do me a favor if you would try current CVS and told me whether it feels faster or not.

It's slower, measurable and reproducible slower.

As a benchmark I used a gradient fill in a 3000x3000px (68.8M) image. I get consistently times of 8s for 1 thread and between 9.2s and 9.6s for 2 threads. With a running application, after a restart -- doesn't matter.

What is strange though, is that it only seems two use one CPU for both threads; maybe a stupid gthread implementation?

Servus, Daniel

GIMP and multiple processors

Hi,

Daniel Egger writes:

It's slower, measurable and reproducible slower.
As a benchmark I used a gradient fill in a 3000x3000px (68.8M) image. I get consistently times of 8s for 1 thread and between 9.2s and 9.6s for 2 threads. With a running application, after a restart -- doesn't matter.
What is strange though, is that it only seems two use one CPU for both threads; maybe a stupid gthread implementation?

What version did you test? What's the last ChangeLog entry?

Sven

GIMP and multiple processors

On 20.02.2005, at 21:55, Sven Neumann wrote:

As a benchmark I used a gradient fill in a 3000x3000px (68.8M) image. I get consistently times of 8s for 1 thread and between 9.2s and 9.6s for 2 threads. With a running application, after a restart -- doesn't matter.

What is strange though, is that it only seems two use one CPU for both threads; maybe a stupid gthread implementation?

What version did you test?

GIMP (r/w) CVS from a few hours ago (about the time when I wrote in that I'll do it but that it'll take a while...)

What's the last ChangeLog entry?

2005-02-20 Sven Neumann

* app/actions/context-actions.c * app/actions/context-commands.c[ch]: added actions to control the
average radius of color picker tools (bug #167765).

* app/actions/tool-options-actions.c: fixed a typo in a comment.

Servus, Daniel

GIMP and multiple processors

Hi,

Daniel Egger writes:

What is strange though, is that it only seems two use one CPU for both threads; maybe a stupid gthread implementation?

Since gthread is just a very thin wrapper around pthreads, that would mean that it's a stupid pthread implementation. To me this looks like the kernel believes that it would be better to keep the threads local than to move one to the other CPU. I wonder if perhaps the kernel is right and using two CPUs would actually cause more overhead than it's worth?

Sven

GIMP and multiple processors

On Sun, Feb 20, 2005 at 10:55:18PM +0100, Sven Neumann wrote:

mean that it's a stupid pthread implementation. To me this looks like the kernel believes that it would be better to keep the threads local than to move one to the other CPU.

Linux will not keep two threads running on a single cpu if both are ready and nothing else is running, regardless of locality etc., as the kernel lacks the tools to effectively decide wether threads should stay on a cpu or not.

(I mean, it's of course bad to interlave operations on a per-pixel basis instead of e.g. a per-tile basis, but the kernel will run the threads concurrently wether or not it gets slower).

right and using two CPUs would actually cause more overhead than it's worth?

That's quite possible, but IFF the kernel indeed keeps the two threads on a single cpu then it means that both aren't ready at the same time, e.g. due to lock contention or other things.

GIMP and multiple processors

On 20.02.2005, at 22:55, Sven Neumann wrote:

Since gthread is just a very thin wrapper around pthreads, that would mean that it's a stupid pthread implementation. To me this looks like the kernel believes that it would be better to keep the threads local than to move one to the other CPU. I wonder if perhaps the kernel is right and using two CPUs would actually cause more overhead than it's worth?

Hm, it is using NPTL, e.g. if I set it to 8 threads I'll have: egger 31344 2.1 14.5 219412 149008 pts/0 Sl+ 23:10 0:05 /opt/gimp-mp-test/bin/gimp-2.3
egger 31344 10.8 14.5 219412 149012 pts/0 Rl+ 23:10 0:26 /opt/gimp-mp-test/bin/gimp-2.3
egger 31344 11.0 14.5 219412 149012 pts/0 Rl+ 23:10 0:26 /opt/gimp-mp-test/bin/gimp-2.3
egger 31344 6.5 14.5 219412 149012 pts/0 Rl+ 23:11 0:08 /opt/gimp-mp-test/bin/gimp-2.3
egger 31344 5.7 14.5 219412 149012 pts/0 Rl+ 23:11 0:07 /opt/gimp-mp-test/bin/gimp-2.3
egger 31344 6.8 14.5 219412 149016 pts/0 Rl+ 23:11 0:08 /opt/gimp-mp-test/bin/gimp-2.3
egger 31344 6.1 14.5 219412 149016 pts/0 Rl+ 23:11 0:08 /opt/gimp-mp-test/bin/gimp-2.3
egger 31344 6.8 14.5 219412 149016 pts/0 Rl+ 23:11 0:08 /opt/gimp-mp-test/bin/gimp-2.3
egger 31344 6.2 14.5 219412 149016 pts/0 Rl+ 23:11 0:08 /opt/gimp-mp-test/bin/gimp-2.3

If I force it back to classical pthreads then it'll be even worse:
egger 31456 70.8 15.0 164232 154852 pts/0 R+ 23:32 1:53 /opt/gimp-mp-test/bin/gimp-2.3
egger 31457 0.0 15.0 164232 154852 pts/0 S+ 23:32 0:00 /opt/gimp-mp-test/bin/gimp-2.3
egger 31458 0.0 15.0 164232 154852 pts/0 S+ 23:32 0:00 /opt/gimp-mp-test/bin/gimp-2.3
egger 31459 0.0 15.0 164232 154852 pts/0 S+ 23:32 0:00 /opt/gimp-mp-test/bin/gimp-2.3

One interesting thing here certainly is that one process (== thread) is doing all the work and the rest is doing nothing in a shape fill, while in a linear fill I'll get: egger 31474 3.4 10.8 120696 111508 pts/0 S+ 23:36 0:04 /opt/gimp-mp-test/bin/gimp-2.3
egger 31475 0.0 10.8 120696 111508 pts/0 S+ 23:36 0:00 /opt/gimp-mp-test/bin/gimp-2.3
egger 31476 35.1 10.8 120696 111508 pts/0 S+ 23:36 0:48 /opt/gimp-mp-test/bin/gimp-2.3
egger 31477 35.1 10.8 120696 111508 pts/0 S+ 23:36 0:48 /opt/gimp-mp-test/bin/gimp-2.3

However those are again running on the same CPU so rather disturbing each other then crunching away.

To be continued...

Servus, Daniel

GIMP and multiple processors

On 20.02.2005, at 23:47, wrote:

Linux will not keep two threads running on a single cpu if both are ready
and nothing else is running, regardless of locality etc., as the kernel lacks the tools to effectively decide wether threads should stay on a cpu
or not.

Yes and no. I just figured out that the tools I were looking for are called schedutils and can be used to change the affinity settings of a process, i.e. pin it to some CPU or allow it to migrate as the kernel decides between a set of CPUs.

Forcing the NPTL implementation to degrade to legacy pthreads means that one thread equals one process and thus can be controlled with taskset.

Oh yes, and I just noticed that now this isn't even necessary anymore because for some reason the kernel now migrates on of the pthread processes to the other CPU automatically after a short while of processing.

(I mean, it's of course bad to interlave operations on a per-pixel basis
instead of e.g. a per-tile basis, but the kernel will run the threads concurrently wether or not it gets slower).

Certainly. Opterons are bandwidth monsters but this doesn't mean that they'll be forgiving to stupid algorithms.

That's quite possible, but IFF the kernel indeed keeps the two threads on
a single cpu then it means that both aren't ready at the same time, e.g.
due to lock contention or other things.

I can force it to use both CPUs now, but even with 200% utilization it is 2s slower to run this stupid ubenchmark than on 1 CPU without threads.

Servus, Daniel

GIMP and multiple processors

Daniel Egger wrote:

I can force it to use both CPUs now, but even with 200% utilization it is 2s slower to run this stupid ubenchmark than on 1 CPU without threads.

Just a vague guess, but the multiprocessor GIMP pixel work scheduler might* farm alternating tiles to alternating CPUs. These are reasonably likely to have been allocated together and thus sit close together in memory, causing memory contention between CPUs.

--Adam * I haven't looked at it.

GIMP and multiple processors

It sounds like the granularity of parallelism is too fine. That is, each "task" is too short and the overhead of task dispatching (your task queue processing, the kernels thread context switching, any IPC required, etc.) is longer then the duration of a single task.

I hit the same problem a decade and a half ago when I worked on distributed parallel ray tracing systems for my post graduate thesis. If each task is a pixel then you may want to consider increasing this to a (configurable size) bundle of pixels. Depending on the algorithm being parallelized the bundle may contain contiguous pixels (if processing of each pixel requires approximately uniform processor time) or a random set of pixels (if there is, or can potentially be, significant variance in per-pixel processing time).

-Dave

-----Original Message----- From: gimp-developer-bounces@lists.xcf.berkeley.edu [mailto:gimp-developer-bounces@lists.xcf.berkeley.edu] On Behalf Of Daniel Egger
Sent: Monday, 21 February 2005 9:13 AM To: pcg@goof.com ( Marc) (A.) (Lehmann ) Cc: Sven Neumann; Developer gimp-devel Subject: Re: [Gimp-developer] GIMP and multiple processors

On 20.02.2005, at 23:47, wrote:

Linux will not keep two threads running on a single cpu if both are ready
and nothing else is running, regardless of locality etc., as the kernel lacks the tools to effectively decide wether threads should stay on a cpu
or not.

Yes and no. I just figured out that the tools I were looking for are called schedutils and can be used to change the affinity settings of a process, i.e. pin it to some CPU or allow it to migrate as the kernel decides between a set of CPUs.

Forcing the NPTL implementation to degrade to legacy pthreads means that one thread equals one process and thus can be controlled with taskset.

Oh yes, and I just noticed that now this isn't even necessary anymore because for some reason the kernel now migrates on of the pthread processes to the other CPU automatically after a short while of processing.

(I mean, it's of course bad to interlave operations on a per-pixel basis
instead of e.g. a per-tile basis, but the kernel will run the threads concurrently wether or not it gets slower).

Certainly. Opterons are bandwidth monsters but this doesn't mean that they'll be forgiving to stupid algorithms.

That's quite possible, but IFF the kernel indeed keeps the two threads on
a single cpu then it means that both aren't ready at the same time, e.g.
due to lock contention or other things.

I can force it to use both CPUs now, but even with 200% utilization it is 2s slower to run this stupid ubenchmark than on 1 CPU without threads.

Servus, Daniel

GIMP and multiple processors

Hi,

"David Bonnell" writes:

It sounds like the granularity of parallelism is too fine. That is, each "task" is too short and the overhead of task dispatching (your task queue processing, the kernels thread context switching, any IPC required, etc.) is longer then the duration of a single task.
I hit the same problem a decade and a half ago when I worked on distributed parallel ray tracing systems for my post graduate thesis. If each task is a pixel then you may want to consider increasing this to a (configurable size) bundle of pixels. Depending on the algorithm being parallelized the bundle may contain contiguous pixels (if processing of each pixel requires approximately uniform processor time) or a random set of pixels (if there is, or can potentially be, significant variance in per-pixel processing time).

The task is not a single pixel but a single tile (that is usually a region of 64x64 pixels). GIMP processes pixel regions by iterating over the tiles. The multi-threaded pixel processor uses a configurable number of threads. Each thread obtains a lock on the pixel-region, takes a pointer to the next tile from the queue, releases the lock, processes the tile and starts over. This goes on until all tiles are processed. The main threads blocks until the queue is empty and all threads have finished their jobs. If a progress callback has been specified, the main thread wakes up regularily and updates the progress bar.

Sven

GIMP and multiple processors

On Mon, Feb 21, 2005 at 12:12:51AM +0100, Daniel Egger wrote:

Linux will not keep two threads running on a single cpu if both are ready
and nothing else is running, regardless of locality etc., as the kernel lacks the tools to effectively decide wether threads should stay on a cpu
or not.
Yes and no. I just figured out that the tools I were looking for are called schedutils and can be used to change the affinity settings of a process, i.e. pin it to some CPU or allow it to migrate as the kernel decides between a set of CPUs.

I should have said "unless specially instructed to do so", i.e., not by default.

Forcing the NPTL implementation to degrade to legacy

BTW, is this 2.4 or 2.6?

I can force it to use both CPUs now, but even with 200% utilization it is 2s slower to run this stupid ubenchmark than on 1 CPU without threads.

Now, that's definitely a result then :)

GIMP and multiple processors

Thanks for the clarification.

If each thread obtains an (exclusive) lock on the pixel region then the tasks will effectively be serialized and overall execution time will increase compared to a non-threaded implementation due to the threading overheads. (Queue manipulation, thread creation and context switching, etc).

The main thread should lock the pixel region before pushing the tasks onto the work. Each task may then operate under the assumption that it has exclusive access to its tile. When the last task has completed the lock can be released.

If this is already the case then please ignore as I have not had the opportunity of yet to review the code.

-Dave

-----Original Message----- From: gimp-developer-bounces@lists.xcf.berkeley.edu [mailto:gimp-developer-bounces@lists.xcf.berkeley.edu] On Behalf Of Sven Neumann
Sent: Monday, 21 February 2005 11:01 AM To: David Bonnell
Cc: Gimp-developer@lists.xcf.berkeley.edu Subject: Re: [Gimp-developer] GIMP and multiple processors

Hi,

"David Bonnell" writes:

It sounds like the granularity of parallelism is too fine. That is, each "task" is too short and the overhead of task dispatching (your task queue processing, the kernels thread context switching, any IPC required, etc.) is longer then the duration of a single task.
I hit the same problem a decade and a half ago when I worked on distributed parallel ray tracing systems for my post graduate thesis. If each task is a pixel then you may want to consider increasing this to a (configurable size) bundle of pixels. Depending on the algorithm being parallelized the bundle may contain contiguous pixels (if processing of each pixel requires approximately uniform processor time) or a random set of pixels (if there is, or can potentially be, significant variance in per-pixel processing time).

The task is not a single pixel but a single tile (that is usually a region of 64x64 pixels). GIMP processes pixel regions by iterating over the tiles. The multi-threaded pixel processor uses a configurable number of threads. Each thread obtains a lock on the pixel-region, takes a pointer to the next tile from the queue, releases the lock, processes the tile and starts over. This goes on until all tiles are processed. The main threads blocks until the queue is empty and all threads have finished their jobs. If a progress callback has been specified, the main thread wakes up regularily and updates the progress bar.

Sven

GIMP and multiple processors

On Sun, Feb 20, 2005 at 11:52:16PM +0000, Adam D. Moss wrote:

I can force it to use both CPUs now, but even with 200% utilization it is 2s slower to run this stupid ubenchmark than on 1 CPU without threads.
Just a vague guess, but the multiprocessor GIMP pixel work scheduler might* farm alternating tiles to alternating CPUs. These are reasonably likely to have been allocated together and thus sit close together in memory, causing memory contention between CPUs.

So it might make sense not to interleave tiles but let each thread start on another region of the image, like this (for 4 threads)

1xxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx
2xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx
3xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx
4xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx

So each of the threads works on "it's own" memory at first and only when it finished it's region, it will get tiles from the remaining work. Some binary-search like algorithm could be used like this:

- figure out largest remaining part of the image (at start, it's the whole image)
- let x be the number of idle threads - if starting up,
divide into x regions
assign each thread the start of it's region else
divide into x+1 regions
assign idle threads the x+1st, x-th, x-1st etc. region (this leaves the first to the thread currently processing it) - start processing the regions
- if any thread runs out of work, restart algorithm

This should improve memory locality while distributing work across all threads and making sure that non-uniform workloads are also distributed.

Consider the above example where thread 4 has finished it's tiles:

1111xxxxxxxxxxxx xxxxxxxxxxxxxxxx
22222222xxxxxxxx
xxxxxxxxxxxxxxxx
333333333333333x
xxxxxxxxxxxxxxxx
4444444444444444
4444444444444444

It looks like 1 will need the longest time to finish it's region, so we jump in there and assign part of it's work to 4 (marked y):

1111xxxxxxxxxxxx xx4yyyyyyyyyyyyy
22222222xxxxxxxx
xxxxxxxxxxxxxxxx
333333333333333x
xxxxxxxxxxxxxxxx
4444444444444444
4444444444444444

Later, 3 finishes:

11111111xxxxxxxx xx44444yyyyyyyyy
2222222222222222
xxxxxxxxxxxxxxxx
3333333333333333
3333333333333333
4444444444444444
4444444444444444

Now, 2 has the largest region left, so half of it get's assign to 3 (marked z):

11111111xxxxxxxx
xx44444yyyyyyyyy
2222222222222222
xxxxxxxx3zzzzzzz
3333333333333333
3333333333333333
4444444444444444
4444444444444444

The state of affairs can be managed easily since there are never more regions than threads.

HTH! Tino.

PS: I don't know whether this algorithm works well in practice, I actually got the idea while replying Adam's response.

GIMP and multiple processors

On 21.02.2005, at 03:14, wrote:

Forcing the NPTL implementation to degrade to legacy

BTW, is this 2.4 or 2.6?

Linux ulli 2.6.9-k8 #31 SMP Wed Nov 3 10:58:29 CET 2004 x86_64 GNU/Linux

32bit userland from Debian unstable.

Servus, Daniel

GIMP and multiple processors

Allocating batches of tiles to each thread and dynamically re-allocating them should a thread finish early will complicate the code and will only provide very marginal speed improvement. (Each thread will have to obtain a lock on a work queue to get its next tile either way but the proposed scheme has a slightly less chance of lock contention since there are effectively multiple queues).

As far as task sequencing goes that all depends on how tile memory is allocated. To maximize locality of reference and minimize paging the goal should be to have all threads working on contiguous areas of memory at the same time where possible.

-Dave

-----Original Message----- From: gimp-developer-bounces@lists.xcf.berkeley.edu [mailto:gimp-developer-bounces@lists.xcf.berkeley.edu] On Behalf Of Tino Schwarze
Sent: Monday, 21 February 2005 6:14 PM To: gimp-developer@lists.xcf.berkeley.edu Subject: Re: [Gimp-developer] GIMP and multiple processors