Sign up now! · Forgot password?
RSS/Atom feed Twitter

VIPS and GEGL performance and memory usage comparison

This discussion is connected to the gegl-developer-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.

This is a read-only list on gimpusers.com so this discussion thread is read-only, too.

15 of 15 messages available
Toggle history

Please log in to manage your subscriptions.

VIPS and GEGL performance and memory usage comparison Sven Claussner 28 Jan 20:58
  VIPS and GEGL performance and memory usage comparison Daniel Rogers 28 Jan 21:29
   VIPS and GEGL performance and memory usage comparison Sven Claussner 29 Jan 04:41
    VIPS and GEGL performance and memory usage comparison Øyvind Kolås 29 Jan 14:20
     VIPS and GEGL performance and memory usage comparison Daniel Rogers 29 Jan 16:37
      VIPS and GEGL performance and memory usage comparison jcupitt@gmail.com 29 Jan 17:52
      VIPS and GEGL performance and memory usage comparison Sven Claussner 01 Feb 20:40
       VIPS and GEGL performance and memory usage comparison Daniel Rogers 01 Feb 21:35
        VIPS and GEGL performance and memory usage comparison Øyvind Kolås 02 Feb 12:37
         VIPS and GEGL performance and memory usage comparison jcupitt@gmail.com 06 Feb 17:26
        VIPS and GEGL performance and memory usage comparison Sven Claussner 03 Feb 21:06
  VIPS and GEGL performance and memory usage comparison Alexandre Prokoudine 29 Jan 00:06
   VIPS and GEGL performance and memory usage comparison Sven Claussner 29 Jan 04:28
  VIPS and GEGL performance and memory usage comparison Adam Bavier 29 Jan 19:07
   VIPS and GEGL performance and memory usage comparison jcupitt@gmail.com 06 Feb 17:29
Sven Claussner
2016-01-28 20:58:21 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

Hi,

the developers of VIPS/libvips, a batch image-processing library, have a performance and memory usage comparison on their website, including a GEGL test. [1]
Some days ago I told John Cupitt, the maintainer there, some issues with the reported GEGL tests.
In his answer to me John points out that GEGL is a bit odd in this comparison, because it is the only interactive image processing library there. He therefore suggests to remove GEGL from this list.

What do you GEGL developers think - does anybody need these results so GEGL should reside in this comparison or would it be OK, if John removed it from the list?

Greetings

Sven

[1] http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use

Daniel Rogers
2016-01-28 21:29:32 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

Hi Sven,

I am confused. What technical reason exists to assume gegl cannot be as fast as vips? Is it memory usage? Extra necessary calculations? Some way in which parallelism is not as possible?

-- Daniel
On Jan 28, 2016 12:58 PM, "Sven Claussner" wrote:

Hi,

the developers of VIPS/libvips, a batch image-processing library, have a performance and memory usage comparison on their website, including a GEGL test. [1]
Some days ago I told John Cupitt, the maintainer there, some issues with the reported GEGL tests.
In his answer to me John points out that GEGL is a bit odd in this comparison, because it is the only interactive image processing library there. He therefore suggests to remove GEGL from this list.

What do you GEGL developers think - does anybody need these results so GEGL should reside in this comparison or would it be OK, if John removed it from the list?

Greetings

Sven

[1] http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use

_______________________________________________ gegl-developer-list mailing list
List address: gegl-developer-list@gnome.org List membership:
https://mail.gnome.org/mailman/listinfo/gegl-developer-list

Alexandre Prokoudine
2016-01-29 00:06:39 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

On Thu, Jan 28, 2016 at 11:58 PM, Sven Claussner wrote:

Hi,

the developers of VIPS/libvips, a batch image-processing library, have a performance and memory usage comparison on their website, including a GEGL test. [1]
Some days ago I told John Cupitt, the maintainer there, some issues with the reported GEGL tests.
In his answer to me John points out that GEGL is a bit odd in this comparison, because it is the only interactive image processing library there. He therefore suggests to remove GEGL from this list.

Which, of course, might remind you https://github.com/jcupitt/gegl-vips :)

Alex

Sven Claussner
2016-01-29 04:28:22 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

On 29.01.16 at 01:06 AM Alexandre Prokoudine wrote: > Which, of course, might remind you https://github.com/jcupitt/gegl-vips :)

Thanks, Alex, for your reminder! I remembered that project and it was discussed sometimes in the years before. The last status (2013) was that libvips could basically be used as GEGL back-end, but still needed area invalidation (see John's post from 10.11.13 on this list). At this time John said it could be in one or two years to be implemented if nobody volunteered for this job.
Looking at the tremendously higher (batch processing) performance of VIPS compared to GEGL (with tile back-end I assume) I personally would appreciate a VIPS back-end very much.

Greetings

Sven

Sven Claussner
2016-01-29 04:41:54 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

On 28.1.2016 at 10:29 PM Daniel Rogers wrote: > Hi Sven,
>
> I am confused. What technical reason exists to assume gegl cannot be as > fast as vips? Is it memory usage? Extra necessary calculations? Some way > in which parallelism is not as possible?

Hi Daniel,

you might have misunderstood me. The performance comparison only shows that VIPS outperforms GEGL at least in this test. Technical reasons can be found here: http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use

In a mail John explained the differences to me: "Gegl is really targeting interactive applications, not batch processing, and it's doing a lot of work that no one else is doing, like conversion to scRGB, transparency, caching, and so on."

I didn't claim that GEGL couldn't be as fast as VIPS. It might be much faster as now by using VIPS as library. This is why there is gegl-vips, a VIPS-based GEGL back-end.

You'll find some more information when digging this list for VIPS, mails from John Cupitt and Nicolas Robidoux or GEGL's performance in general.

Greetings

Sven

Øyvind Kolås
2016-01-29 14:20:14 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

On Fri, Jan 29, 2016 at 5:41 AM, Sven Claussner wrote:

On 28.1.2016 at 10:29 PM Daniel Rogers wrote:

I am confused. What technical reason exists to assume gegl cannot be as fast as vips? Is it memory usage? Extra necessary calculations? Some way in which parallelism is not as possible?

you might have misunderstood me. The performance comparison only shows that VIPS outperforms GEGL at least in this test. Technical reasons can be found here: http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use

In a mail John explained the differences to me: "Gegl is really targeting interactive applications, not batch processing, and it's doing a lot of work that no one else is doing, like conversion to scRGB, transparency, caching, and so on."

GEGL is doing single precision 32bit floating point processing for all operations, thus should not introduce the type of quantization problems 8bpc/16bpc pipelines introduce for multiple filters - at the expense of much higher memory bandwidth - the GEGL tile cache size (and swap backend) should be tuned if doing benchmarks. If this benchmark is similar to one done years ago, VIPS was being tested with a hard-coded 8bpc 3x3 sharpening filter while GEGL was rigged up to use a composite meta operation pipeline based unsharp mask using gaussian blur and compositing filters in floating point. These factors are probably more a cause of slow-down than the startup time loading all the plug-in shared objects, which still takes more than a second on my machine per started GEGL process.

/pippin

Daniel Rogers
2016-01-29 16:37:34 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

On Jan 29, 2016 6:20 AM, "Øyvind Kolås" wrote:

GEGL is doing single precision 32bit floating point processing for all operations, thus should not introduce the type of quantization problems 8bpc/16bpc pipelines introduce for multiple filters - at the expense of much higher memory bandwidth - the GEGL tile cache size (and swap backend) should be tuned if doing benchmarks. If this benchmark is similar to one done years ago, VIPS was being tested with a hard-coded 8bpc 3x3 sharpening filter while GEGL was rigged up to use a composite meta operation pipeline based unsharp mask using gaussian blur and compositing filters in floating point. These factors are probably more a cause of slow-down than the startup time loading all the plug-in shared objects, which still takes more than a second on my machine per started GEGL process.

Ah so this is interesting. So I feel like rather than removing gegl from that list of benchmarks, it would be better to build more benchmarks, especially ones that call out all the advantages of gegl. E.g. minimal updates, deep pipeline accuracy, etc.

It is worth calling out gegls limitations and being honest with them for three reasons. First, they are not fundamental to the design of gegl. Just having a vips backend proves that. Second, a lot of the tricks vips does, gegl really can learn from, and having benchmarks that do not look so good is a great way to call out opportunities for improvement. And third, benchmarks help users make good decisions about whether gegl is a good fit for their needs. Transparency is one of the deeply valuable benefits of open source.

In terms of technical projects I feel having this benchmark and the discussion about it inspires:

- Gegl could load plugins in a more demand driven way, reducing startup costs.
- Gegl could have multiple pipelines optimized for different use cases. - A fast 8 bit pipeline is great for previews or single operation stacks, or when accuracy is not as important for the user. - Better threading, including better I/O pipelining is a great idea to lift from vips.
- Anyone can do dynamic compilation nowadays with llvm. Imagine taking the gegl dynamic tree, and compiling it into a single LLVM dynamically compiled function.

So if any of the above actually appear in patch sets, then we, at least partially, have this benchmark to thank for motivating that. I can see ways in which any one of the above projects can benefit GIMP as well. And in terms of transparency and user benefit, , the vips developers' benchmark also makes me think that there really should be a set of benchmarks that call out the concrete user benefits for gegl. E.g. higher accuracy, especially for deep pipelines. If these benefits exist it must be possible to measure them, and show how gegl truly beats out everyone else it it's areas of focus. In a very reals sense, vips is doing exactly what they should be. They are saying "if speed for a single image one-and-done operation is what you need vips is your tool, and gegl really isn't." That sounds like an extremely fair statement to me right now, until some of gegls limitations in this area are addressed. And long term, why not?

-- Daniel

jcupitt@gmail.com
2016-01-29 17:52:29 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

Hello all, vips maintainer here, thank you for this interesting discussion.

On 29 January 2016 at 16:37, Daniel Rogers wrote:

A fast 8 bit pipeline is great for previews or single operation stacks, or when accuracy is not as important for the user.

My feeling is that gegl is probably right to be float-only, the cost is surprisingly low on modern machines. On my laptop, for that benchmark in 8-bit I see:

$ time ./vips8.py tmp/x.tif tmp/x2.tif real 0m0.504s
user 0m1.548s
sys 0m0.104s

If I add "cast(float)" just after the load, and "cast(uchar)" just before the write, the whole thing runs as float and I see:

$ time ./vips8.py tmp/x.tif tmp/x2.tif real 0m0.578s
user 0m1.768s
sys 0m0.148s

Plus float-only makes an opencl path much simpler.

As you say, this tiny benchmark is very focused on batch performance, so fast startup / shutdown and lots of file IO. It's not what gegl is generally used for.

John

Adam Bavier
2016-01-29 19:07:30 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

As someone new to the gegl development list and seeing the performance numbers in that benchmark, I propose adding a asterisk * by each gegl number would help the reader understand that something is different with this library. Then add the corresponding asterisk down by the statement, "GEGL is not really designed for batch-style processing -- it targets interactive applications, like paint programs." Since gegl is the only interactive library in the list the asterisk works well enough and separating it out to a different table is not necessary.

Best regards, -Adam Bavier

On Thu, Jan 28, 2016 at 2:58 PM, Sven Claussner wrote:

Hi,

the developers of VIPS/libvips, a batch image-processing library, have a performance and memory usage comparison on their website, including a GEGL test. [1]
Some days ago I told John Cupitt, the maintainer there, some issues with the reported GEGL tests.
In his answer to me John points out that GEGL is a bit odd in this comparison, because it is the only interactive image processing library there. He therefore suggests to remove GEGL from this list.

What do you GEGL developers think - does anybody need these results so GEGL should reside in this comparison or would it be OK, if John removed it from the list?

Greetings

Sven

[1] http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use

_______________________________________________ gegl-developer-list mailing list
List address: gegl-developer-list@gnome.org List membership:
https://mail.gnome.org/mailman/listinfo/gegl-developer-list

Sven Claussner
2016-02-01 20:40:56 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

Hi Daniel,

thanks for sharing your thoughts. I agree with you in many points.

On 29.1.2016 at 5:37 PM Daniel Rogers wrote:

* Anyone can do dynamic compilation nowadays with llvm. Imagine taking the gegl dynamic tree, and compiling it into a single LLVM dynamically compiled function.

What exactly do you mean? How is this supposed to work and where is the performance advantage if done at runtime?

Greetings

Sven

Daniel Rogers
2016-02-01 21:35:32 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

On Feb 1, 2016 12:40 PM, "Sven Claussner" wrote:

On 29.1.2016 at 5:37 PM Daniel Rogers wrote:

* Anyone can do dynamic compilation nowadays with llvm. Imagine

taking the gegl dynamic tree, and compiling it into a single LLVM dynamically compiled function.

What exactly do you mean? How is this supposed to work and where is the performance advantage if done at runtime?

To your first question. I made that statement as a counterpoint to vips turning a convolution kernel into a set of sse3 instructions and executing them.

I believe, though haven't proven rigorously, that a gegl graph is homomorphic to a parse tree of an expression language over images.

In other words, there exists an abstract language for which gegl is the parse tree.

For example:
a = load(path1)
b = load(path2)
c = load(path3)
out = a * b + c
write(out)

Given suitable types, and suitable definitions for *, =, and +, there is a gegl graph which exactly describes that program above. (For the record, I believe the language would have to be single assignment and lazy evaluated in order to be homomorphic to the DAG of gegl).

If that is the case, you can turn the argument on its head and say that gegl is just an intermediate representation of a compiled language. This makes the gegl library itself an interpreter of the IR.

Given these equivalencies, you can reasonably ask, can we use a different IR? Can we transform one IR to another? Can we use a different interpreter? The answer to all of these is yes, trivially.

So. Can we transform a gegl graph to a llvm IR? Can we then pass that LLVM IR to llvm to produce the machine code equivalent of our gegl graph?

If we did that, then all of the llvm optimization machinery comes for free. So I would reasonably expect llvm to merge operations into single loops, combine similar operations, reduce the overall instruction count, and inline lots of code, reduce indirection, loop unroll, etc. Llvm has quite a few optimization passes.

To your second question: the gegl tree is executed a lot. At least once for every tile in the output. This is especially true if gegl is used interactively and the same tree is evaluated thousands or millions of time with different inputs. Thus you would be trading an upfront cost of building the compiled graph with reduced runtime per tile, and reduced total runtime.

There are potentially more conservative approaches here that turn a gegl tree into a set of byte codes, and refactoring large chunks of gegl into a bytecode interpreter.

A really interesting follow up is just what other kinds of IR and runtimes can we use? Gegl to jvm bytecode? Gegl to Cg? Gegl to an asic? (FPGA, DSP, etc)?

--
Daniel

Øyvind Kolås
2016-02-02 12:37:55 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

On Mon, Feb 1, 2016 at 10:35 PM, Daniel Rogers wrote:

* Anyone can do dynamic compilation nowadays with llvm. Imagine

taking the gegl dynamic tree, and compiling it into a single LLVM dynamically compiled function.

What exactly do you mean? How is this supposed to work and where is the performance advantage if done at runtime?

To your first question. I made that statement as a counterpoint to vips turning a convolution kernel into a set of sse3 instructions and executing them.

I believe, though haven't proven rigorously, that a gegl graph is homomorphic to a parse tree of an expression language over images.

For a subset of operation this might work, but not for generic ops - that possibly use shared libraries rather than arithmetic in their implementation, an approach that might work out for some of the subset and permit reusing existing infrastructure in GEGL is to recombine the cores of OpenCL point filters/composers and submit one image processing kernel for OpenCL compilation - which for many(/most?) OpenCL implementations would end up using LLVM in the background.

This is however different from my complaint of the benchmark comparison - where VIPS is using a 3x3 convolution kernel, and the GEGL code uses gegl:unsharp-mask which is : gegl:gaussian blur + a point composer .. which in turn is a horizontal blur, and a vertical blur + a point composer. Comparing a 3x3 area op with a composite much more general purpose sharpening filter that can do (and already for the parameters provided) would do larger input area as well as by its nature have more temporary buffers is not a proper apples to oranges comparison. Adding a gegl:3x3-convolution (or adapting gegl:convolution-matrix to detect the extent of the kernel) might make GEGL perform closer to VIPS on this benchmark which caters well to VIPS features. I do however not think we should add "hard-coded" 3x3 sharpen/blur ops in GEGL.

/pippin

Sven Claussner
2016-02-03 21:06:11 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

Hi,

these are interesting thoughts and I'm glad to read about mathematics and computer science that goes beyond the day-to-day topics.

On 1.2.2016 at 10:35 PM Daniel Rogers wrote:

Given these equivalencies, you can reasonably ask, can we use a different IR? Can we transform one IR to another? Can we use a different interpreter? The answer to all of these is yes, trivially.

So. Can we transform a gegl graph to a llvm IR? Can we then pass that LLVM IR to llvm to produce the machine code equivalent of our gegl graph?

To my understanding much of this is already covered by the existing GEGL graph language to C/OpenCL to binary code transformations. One interesting application of IR transformation would IMHO be graph optimization the same way query execution graphs in database systems are optimized. So, if we have a graph of many GEGL ops the commutative ops could be reordered to increase computation speed: (Color Tool, Crop) --> (Crop, Color Tool)

Equivalent or inverse ops could be merged into one single op: (Brightness + 10, Brightness -2) --> Brightness +8 or other ops working on a convolution matrix.

This might not work for all ops, e.g. (Blur, Crop) --> (Crop, Blur)
would not work, because the Blur op processes adjacent pixels which get lost by cropping (if we don't leave the Blur radius as extra border for cropping).

To your second question: the gegl tree is executed a lot. At least once for every tile in the output. This is especially true if gegl is used interactively and the same tree is evaluated thousands or millions of time with different inputs. Thus you would be trading an upfront cost of building the compiled graph with reduced runtime per tile, and reduced total runtime.

OK, I agree here.

A really interesting follow up is just what other kinds of IR and runtimes can we use? Gegl to jvm bytecode? Gegl to Cg? Gegl to an asic? (FPGA, DSP, etc)?

Interesting brainstorming thoughts, too. Considering them I think the OpenCL approach is the most flexible one, that could possibly also be used to access DSP's computing power.

Greetings

Sven

jcupitt@gmail.com
2016-02-06 17:26:43 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

On 2 February 2016 at 12:37, Øyvind Kolås wrote:

comparison. Adding a gegl:3x3-convolution (or adapting gegl:convolution-matrix to detect the extent of the kernel) might make GEGL perform closer to VIPS on this benchmark which caters well to VIPS features. I do however not think we should add "hard-coded" 3x3 sharpen/blur ops in GEGL.

I agree, it's not very fair. I tried with gegl:convolution-matrix, but it was a lot slower, I'm not sure why.

I realized that the tiff writer is writing float scRGBA, which is also not very fair. Is there a simple way to make it write a lower bit-depth image? Sorry for the stupid question.

John

jcupitt@gmail.com
2016-02-06 17:29:25 UTC (over 3 years ago)

VIPS and GEGL performance and memory usage comparison

On 29 January 2016 at 19:07, Adam Bavier wrote:

numbers in that benchmark, I propose adding a asterisk * by each gegl number would help the reader understand that something is different with this library. Then add the corresponding asterisk down by the statement, "GEGL is not really designed for batch-style processing -- it targets interactive applications, like paint programs." Since gegl is the only interactive

That's a good idea, thank you. I've updated the page with numbered notes by some results to make the qualifications easier to find.

John