RSS/Atom feed Twitter
Site is read-only, email is disabled

(preferably Linux-based, OS) utility to extract images from image-based pdf files ...

This discussion is connected to the gimp-user-list.gnome.org mailing list which is provided by the GIMP developers and not related to gimpusers.com.

This is a read-only list on gimpusers.com so this discussion thread is read-only, too.

2 of 3 messages available
Toggle history

Please log in to manage your subscriptions.

CAFakBwjccnaypw7_c7_j2Rae+A... 03 Sep 15:53
  (preferably Linux-based, OS) utility to extract images from image-based pdf files ... Albretch Mueller via gimp-user-list 03 Sep 15:53
   (preferably Linux-based, OS) utility to extract images from image-based pdf files ... Guy Stalnaker via gimp-user-list 03 Sep 16:06
Albretch Mueller via gimp-user-list
2019-09-03 15:53:25 UTC (over 4 years ago)

(preferably Linux-based, OS) utility to extract images from image-based pdf files ...

The output of pdfimages would be a whole page image if the input is a non-searchable, image-based pdf files. Take for example:

https://www.nysedregents.org/ushistorygov/Archive/20000126exam.pdf

which utility would detect the cartoons on page 6 and 7?

lbrtchx gimp-user-list@gnome.org

Guy Stalnaker via gimp-user-list
2019-09-03 16:06:43 UTC (over 4 years ago)

(preferably Linux-based, OS) utility to extract images from image-based pdf files ...

Albretch,

This will likely be a two-step process:

1. Extract page images from pdf using a tool such at the one you indicate. 2. Extract subpage components from separate image files, using a tool such as: https://opensource.com/article/18/5/getting-started-luminoth

Linux terminals makes such processing rather easy.

NOTE that I have not personally used a tool like luminoth so I cannot comment on its accuracy, but its website pages suggest it may have the capability to do what you want (well, using the example document you provided and the examples shown on its website).

Good luck!

Guy Stalnaker jimmyg521@gmail.com

On Tue, Sep 3, 2019 at 10:53 AM Albretch Mueller via gimp-user-list < gimp-user-list@gnome.org> wrote:

The output of pdfimages would be a whole page image if the input is a non-searchable, image-based pdf files. Take for example:

https://www.nysedregents.org/ushistorygov/Archive/20000126exam.pdf

which utility would detect the cartoons on page 6 and 7?

lbrtchx gimp-user-list@gnome.org
_______________________________________________ gimp-user-list mailing list
List address: gimp-user-list@gnome.org List membership: https://mail.gnome.org/mailman/listinfo/gimp-user-list List archives: https://mail.gnome.org/archives/gimp-user-list