Extracting images in Word

Hi All
Don't worry this isn't a job for the MS paperclip.
I'm passing in a Word document as a stream and want to extract out the images that are embedded within it, does anyone know how to do this (in Java)? If it helps I'm only looking to extract the jpegs gifs and bmp.
Thank you

There's a package called POI which can be used to generate word docs, and I suppose to does some parsing of them as well. Never used it though.

Similar Messages

  • How can I extract images from Numbers?

    Sounds a bit stupid but how do you extract images from Apples 'Numbers 3' on a Mac?  I have a client that sends images in XLS files, that normally in Word I would Alt Click and 'save as...'
    I was trying to stop using Word but this simple little feature will make all the difference.
    Can anyone point me in the right direction.
    Thanks,
    Dave

    Hi Wayne,
    Thank you for your response.  Sadly if you do this it copies in to the MACs clipboard as a snap shot of the image at the dimensions in Numbers.
    Eg. 
    An image I've extracted from the document via Excel saved out at it's original dimensions 1024 x 300px, even though it was placed, then scaled down to 300px wide on the sheet.
    Same image copied then pasted in to an image editor (as I couldn't paste into finder) was only 300px wide.
    The feature I need is to be able to save out the original image placed in the Numbers sheet.
    Anyone with ideas on this?

  • Emf images in word document convert badly to pdf using adobe plug-in

    I have a problem with converting word documents containing emf images to pdf using the adobe plugin in the MS word ribbon which is provided with Adobe Acrobat XI standard. The text and some of the image formatting in the emf vector image either disappears or moves within the in the resulting output when the document is converted to a pdf. The issue also occurs when I use the "File > save as adobe pdf" option.
    Example image in word:
    Example result after pdf conversion:
    I have adjusted many of the settings to try and get the conversion to work.
    When I use the print then select the adobe printer  this image pdf's ok. However he Adobe word plug-in is potentially very useful and a time saver because it allows default settings (including security) to be applied time and time again by default ensuring consistency. It is also important for us to be able to use a high quality vector image.
    I hope someone can help here!
    Thanks
    Tom

    We had the problem last year for my last PhD student and his dissertation. It was an issue with vector graphics, not bitmaps. In his case, the lines on a curve were solid, dotted, and dashed, but all came out as solid. In the printed version it was correct. The only solution we found was to create the file from the plugin and a second file with the printer. Then the second file was inserted in the original with replace pages. This meant that all the markup, bookmarks, and such were retained, but the graphics worked. If you do not need the extra bookmarks and such of the plugin, then just set up the printer to do the security you want. If you always want the security set, set it in the printer properties in the Start>Printers menu of Windows.

  • White background of extracted image prints in pdf

    Hi'ya
    Using the "Extract" filter to crop a photo in Photoshop 7.0 allowed me to extract a person from a photo and it was saved as a .psd file.
    In my InDesign 2.0.1 layout, the psd file looks great and as it was an extracted image, I can put things in the foreground or background for a layering effect.
    I go to make a pdf and the pdf looks great on screen but when it comes to printing a white but invisible box prints around the psd file and blocks off items behind it.
    HELP!

    Did You export a pdf 1.3?
    And what kind of printing are You talking about? Offset or some office inkjet?
    Because in my experience the Acrobat preview (if Overprint Preview is active at least) represents the actual output very closely; and therefore I wonder if the printer or the RIP may be at fault.
    But actually I think the problem might be better fit for the Acrobat forum as Photoshop seems not to be the cause of the problem.

  • When I copy and paste an image from Word for usage in a Framemaker document it becomes blurred.

    Hi,
    I am saving graphics from Word for usage in a Framemaker document. When I copy and paste the file into photoshop it becomes blurred and unusable. How do I resolve this? I have tried saving the file in it's highest resolution but that doesn't work either.
    Thanks,
    Niall.

    The problem has to be the image itself, but then you give us zero information.
    We don't even know what platform you are on, Macintosh or Windows, what exact version of the OS, of Photoshop, or Word you are using, how you are getting what kind of image from Word, etc.
    It could be that you're just copying the low resolution thumbnail preview from MS Word, but we know nothing about its dimensions, resolution or format.
    Please read this FAQ for advice on how to ask your questions correctly for quicker and better answers:
    http://forums.adobe.com/thread/419981?tstart=0
    Thanks!

  • Which is better? Extracting images from directories or from database?

    Good day,
    I would like to start a discussion on extracting image (binary data) from a relational database. Although some might say that extracting image from directories is a better approach, I m still sceptic on that implementation.
    My argument towards this is based on the reasonings below:
    1. Easier maintainence. - System Administrator can do backup from one place which is the database.
    2. High level of security - can anyone tell me how easy it is to hack into a database server?
    3. image is not dependent on file structure - no more worries about broken links because some one might mistakenly change the directory structure. If there needs to be a change, it will be handle efficiently by the database server.
    The intention of my question is to find out :
    1. Why is taking image from a directory folder which resides on the web server is better than using the same approach from the database?
    2. How is this approach (taking image from directory) scalable if there is thousands of images and text that needs to be served?
    If anybody would be kind enough to reply, I would be most grateful.
    Thank You.
    Regards
    hatta

    Databases are typically more oriented towards text and number content than binary content, I believe. If you carry images in the database you will need to run them through your code and through your java server before they are displayed. If they are held in a directory they will be called from hrefs in the produced page, which means that they are served by your static server. This is quicker because no processing of the image is required. It also means the Database has to handle massively less data. Depending on the database this should be far quicker to query.
    It is worth noting that it is also quite difficult to actually change mime-types on a page to display a picture in the midst of HTML- the number of enquiries on these pages about this topic should be enough to illustrate this.
    If you give over controls of all the image file handling to your java system (which I do when I write sites like the one you describe) then the actual program knows where to put the images and automatically adds them to the database. The system administrator never needs to touch them. If they want a backup they save the database and the website directory. The second of those should be a standard administrative task anyway, so there is not a huge difference there. The danger of someone accidentally changing the directory structure is no greater than the danger of someone accidentally dropping a database table- it can be minimised by making sure your administrators are competent. Directory structures can be changed back, dropped tables are gone.
    The security claim is slightly negated because you still have to run a webserver. Every program you run on your server is vulnerable to attack but if you are serving web pages you will have a server program that is faster than a database for image handling. You are far more at risk from running FTP or Telnet on your server or (worst of all) trying to maintain IIS.
    The images in directory structure is more scalable because very large databases are more likely to become unstable and carrying a 50k image in every image field rather than 2 bytes of text will make the database roughly 25000 times larger. I have already mentioned the difference in serving methods which stands in favour of recycling images. A static site will be faster than a dynamic site of equivalent size, so where you can, take advantage of that.

  • Use Powershell to replace text with image in Word document

    I have a powershell script that uses a Word document as a template to create signatures that I am pushing out to my organization.
    The document is populated with text formatted the way I want the signature to look, that I then do a FindText and ReplaceText on.  This works fine for replacing text with text, but I can't figure out how to properly replace some of the holder text with
    an image and a link.  I found a few posts about adding images to word documents, but none that seem to work properly in this scenario.
    Any insight would be greatly appreciated, thanks!

    Dear BOFH,
    You are correct that method I outlined is not for inserting an image into a signature block (which would be in Outlook, not Word).  The links you post do certainly deal with outlook signatures, well done... Except that the question was about how to
    use a Powershell script to replace text in a Word document with an image.  Sure it was framed in the context of creating signatures, but the poster expressed that they already had a method of generating and replacing text, and just needed to know, as
    I did, how to do the thing they actually asked.
    Please BOFH... Please forgive my audacity in hoping to find a reference (any reference) to how to replace Word text with images via Powershell in a thread titled "Use Powershell to replace text with image in Word document".
    This is certainly a scripting question, and even something as simple as "You will need to call the .NET methods for the Word find/replace functionality.  Please ask in the Word forums for the correct method to use. 
    If you need help on calling .NET methods look HTTP ://here"support you offered combined with the contempt you offer in response to my actual substantive help to the actual question asked.
    BOFH, you are not better than us, just more arrogant.
    Can you please start your own question as this one has been closed.  Please see scripting guidelines.
    We cannot guarantee you satisfaction as this is a user supported forum.  The is no SLA for community support.  Perhaps if you posted a better worded question as a new topic someone might be able to help you resolve your issue.
    The topic you are posting on is closed and answered.
    ¯\_(ツ)_/¯

  • Extracting Images from EPS - Problem with the Output

    Hello,
    I got a big problem extracting images from an EPS, which has been made in InDesign. The image does not appear correct.
    Source code:
    %ALDImageFileName: Speicher:image.jpg
    %ALDImageDimensions: 30 30
    %ALDImageCropRect: 0 0 30 30
    %ALDImageCropFixed: 0 0 30 30
    %ALDImagePosition: 203.8677 344.8913 203.8677 359.2913 218.2677 359.2913 218.2677 344.8913
    %ALDImageType: 4 8
    %%BeginObject: image
    [14.4 0 0 14.4 203.868 344.891 ]ct
    snap_to_device
    Adobe_AGM_Image/AGMIMG_fl cf /ASCII85Decode fl /RunLengthDecode filter ddf
    <<
    /T 1
    /W 30
    /H 30
    /M[30 0 0 -30 0 30 ]
    /BC 8
    /D[0 1 0 1 0 1 0 1 ]
    /DS [
    [AGMIMG_fl 30 string /rs cvx /pop cvx] cvx
    [AGMIMG_fl 30 string /rs cvx /pop cvx] cvx
    [AGMIMG_fl 30 string /rs cvx /pop cvx] cvx
    [AGMIMG_fl 30 string /rs cvx /pop cvx] cvx
    /O 3
    >>
    %%BeginBinary: 1
    img
    p&G-p!<E*"!<<3"!!!&i!!30$!Vl`q!V$-m!WW3&!r`0#!Wh]h!!20]!WW3$qZ$^!!!!&f!!30$!WE'!
    !WE'$!<<*#nc/[l!W<!#!WW3%h>dNVnGiXl!<<-)!!*'"!<<2f!<*$!!<3)a!!!&t!!*-%kl:\ar;Zm"
    !!2He#6G2G(D[>Z!<*#e!!WKUEK,/l!r`0"!V$.!!Wi?'-t[mD-NjDN!!2Zk!!3#u"T\]4$OR%8!!!&h
    !!<6)*\I@D!\,8pq>g3h%KQP0';ta-rr2ho&-2b3mf3Y*Q/q,jk2D[9"76'h!=o#B!Y,22n,Nn!!!=Q!
    0.ee,0/"XS!VHHl!!E<'!"[l]rrUsd!Ug!r&'a/!k3(mej38#8mJm8&'`/UF'`J.(!"8l.+"o,p/hSe-
    /M8.8!!<6%!i#Yn!<2rs!2Ao^$NZFPk2tpjkND!jP3r;O%29TU'`o-b%e'H-!=:8,0/"h-.kN>(.gkeH
    "T^(Js8W&srrE$$rr;gE!V-4"""*$ljlY^djQYmmi$\'q!W`9(p([Q1nc/dn!#@(:r[nC..kE8)/emgU
    !<IHErs8W(rr<#trr76B!:p1!CZadBk2kmhj65^iD"%E-$5O?k'`&OE$M"0,!<<uh0.ee0/h\e,/h.k2
    !!*+arVm-$s8W)us8J2U!;Z[(L?.U[kN:sflf@0hK^f$D!=T\>'a5?f'c%2YnGit6/1iM*/1iA*.kWFi
    nc/dn!I+SBqYpWrrcdo5!<N60C$4[@ki_*hl0.?lD".K0!=9MV'c%S_'`f!Vnc01&!!t50/M8P+/h8J)
    .gH+F!!+PDrVllsrVlrp.KT,@#QauQip#Ifk3*$4ki:gb-3Nc<$j.4Z'GV>s'bqJt('F(,%06GL/hJ_.
    .k<;..j"f]!!32!rql]tr;WMmmf3c`k2tsgk2tmikEZ,Kn,NJ!'`/UJ'G_,X!V-6i!=L8*/2&S./M&=P
    !!2Ti"9AV]s8N#t"o\JY&c_t$!"-g0j6,RdkNBr!!r2ii!!!]1'a,9f'*84(!"Ar/""YuT/M8Y)!WW3$
    nc0%!!!!Z$s8N#ts76`i!"T,3!!3\bl0.?qjkC7H!!;`lrW!!#!>PMJ!u;Ub!Ug$f!=JlU/M/S*/ggZ#
    !!2Ng$3K)as8N&tjT#8]m/R@3k2b[ck03)j!>?4H'`\p6!!**(r[e6p.kWIZmJm:h!#kk:#.sp7ru_:B
    mJmV6jQ>U%kN:kH!!!)h!!!'!'`\gE'`eC:mf3^r!=pb8/ds9_.h;[J!!u[*s8GIerr6%#m/RGck3(kk
    kNM&%!Up'o!<<*)'bhDfrY5G=!Vufj!"K#0!#mOB.0(kM0,afj!V$-u!<A8^s6B[Es8S8_!U]poKBMCG
    !7'`iKECN@#RhF_'*&[\'b'X5r;[9-!$a!F*W[<7.jP)q!V-4#!<<,As8T>(W;lnB!!!&g!"63skFhmD
    jlEuS!!2Ng#S7^d%KHtR'bTm7!s/H(/-mj['`\n!/1Ukk!"985s8@'?Du]k7%KHP#!"8l0$KC71C'%l,
    jl$a?!!<6%!!3'6!X\o6rY5J8!!2Qh#Ri::0*_I`/1U`S%.4)q5Q:Zg2ukf6s8P=a!V$-t!]]?)jY?ff
    kN:qhmJmFp()7Pf!!N99!!Vcj$l1NL.KKPM/1iIi!!3'!o`,F%!dakFpBL[$pAb/A!Up's!H@5cf`qK\
    hWF!BmJmV%'GV;_!!*`J'FFm3!!WH(!?WpE,Q7ZH+tP;f!!2foq>^W-s8Tn6!!L"8s0r.!!">[bkcXsf
    !3>;FU]CPd!!<]G'bL?E!tZ.^&+9K$!\#`P)?'U>'eM$g!!2Zk!W`9$$g.KhLB.ATL&_2?"7Q9n!RU)n
    I/X*KGN@l9"Rc<t',;8j!!!$,'bhAP!!WQ+!"US5%fQG1%PKFQn,NUm!*fL'9)\en8,rO]!V-4"!rrBu
    jQ/].!WYOLl&b?6"9AcD'aFX;"9oSU%0>nt$NV(q!rr<%"YC_q"7Z@"!<<+Eq@Wc/!#,C9"9&;k!!E?)
    D;u0+!!<qtS-&0["T\T'$5EXA!!33:%da5t!<<]/!!3u<!V$-o!<<*#!FZ$-!!A;d!!2ipq#CKu!<?7$
    !!WK)KEM8T"RZ6i$2so,!<<Q0li77k!!30$!<</h!!!'!!!E<'!!!&c!"8o/!!*'#"98E'!WW;h!!!'!
    !!NB'!<<*#n,N[o!<<*#!!!'!!!!&f!!E<&!!*-"!!!'!!!**%o)K!s!!*-$!<<*#!!<-"!Wi?&l2Ueb
    rVup!q#Gp~>
    %%EndBinary
    %%EndObject
    Orginal image Size 553,9 KB
    After decoding the size is only 1,8 KB
    First try.
    public class ASCII85Decode {
    private ByteBuffer buf;
    * initialize the decoder with byte buffer in ASCII85 format
    private ASCII85Decode(ByteBuffer buf) {
    this.buf = buf;
    * get the next character from the input.
    * @return the next character, or -1 if at end of stream
    private int nextChar() {
    // skip whitespace
    // returns next character, or -1 if end of stream
    while (buf.remaining() > 0) {
    char c = (char) buf.get();
    if (!isWhiteSpace(c)) {
    return c;
    // EOF reached
    return -1;
    public static boolean isWhiteSpace(int c) {
    return (c == ' ' || c == '\t' || c == '\r' || c == '\n'
    || c == 0 || c == 12);
    // 0=nul, 12=ff
    * decode the next five ASCII85 characters into up to four decoded
    * bytes.  Return false when finished, or true otherwise.
    * @param baos the ByteArrayOutputStream to write output to, set to the
    *        correct position
    * @return false when finished, or true otherwise.
    private boolean decode5(ByteArrayOutputStream baos)
    throws PDFParseException {
    // stream ends in ~>
    int[] five = new int[5];
    int i;
    for (i = 0; i < 5; i++) {
    five[i] = nextChar();
    if (five[i] == '~') {
    if (nextChar() == '>') {
    break;
    } else {
    throw new PDFParseException(
    "Bad character in ASCII85Decode: not ~>");
    } else if (five[i] >= '!' && five[i] <= 'u') {
    five[i] -= '!';
    } else if (five[i] == 'z') {
    if (i == 0) {
    five[i] = 0;
    i = 4;
    } else {
    throw new PDFParseException(
    "Inappropriate 'z' in ASCII85Decode");
    } else {
    throw new PDFParseException(
    "Bad character in ASCII85Decode: " + five[i]
    + " (" + (char) five[i] + ")");
    if (i > 0) {
    i -= 1;
    int value = five[0] * 85 * 85 * 85 * 85 + five[1] * 85 * 85
    * 85 + five[2] * 85 * 85 + five[3] * 85 + five[4];
    for (int j = 0; j < i; j++) {
    int shift = 8 * (3 - j);
    baos.write((byte) ((value >> shift) & 0xff));
    return (i == 4);
    * decode the bytes
    * @return the decoded bytes
    private ByteBuffer decode() throws PDFParseException {
    // start from the beginning of the data
    buf.rewind();
    // allocate the output buffer
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    // decode the bytes
    while (decode5(baos)) {
    return ByteBuffer.wrap(baos.toByteArray());
    * decode an array of bytes in ASCII85 format.
    * <p>
    * In ASCII85 format, every 5 characters represents 4 decoded
    * bytes in base 85.  The entire stream can contain whitespace,
    * and ends in the characters '~&gt;'.
    * @param buf the encoded ASCII85 characters in a byte buffer
    * @param params parameters to the decoder (ignored)
    * @return the decoded bytes
    public static ByteBuffer decode(ByteBuffer buf)
    throws PDFParseException {
    ASCII85Decode me = new ASCII85Decode(buf);
    return me.decode();
    2. Try
    public class Ascii85Decode {
    private final FileOutputStream fos;
    private Ascii85Decode() throws FileNotFoundException {
    File f = new File("/home/markus/1/test/bild.jpg");
    fos = new FileOutputStream(f);
    @Override
    protected void finalize() throws Throwable {
    fos.close();
    super.finalize();
    public static byte[] decode(byte[] in) throws FileNotFoundException, IOException {
    Ascii85Decode ascii85Decode = new Ascii85Decode();
    ascii85Decode.setData(in);
    ascii85Decode.decode85();
    return null;
    static long pow85[] = {85 * 85 * 85 * 85, 85 * 85 * 85, 85 * 85, 85, 1};
    private byte[] in;
    private void setData(byte[] in) {
    this.in = in;
    void wput(long tuple, int bytes) throws IOException {
    switch (bytes) {
    case 4:
    putchar((int) (tuple >> 24));
    putchar((int) (tuple >> 16));
    putchar((int) (tuple >> 8));
    putchar((int) tuple);
    break;
    case 3:
    putchar((int) (tuple >> 24));
    putchar((int) (tuple >> 16));
    putchar((int) (tuple >> 8));
    break;
    case 2:
    putchar((int) (tuple >> 24));
    putchar((int) (tuple >> 16));
    break;
    case 1:
    putchar((int) (tuple >> 24));
    break;
    void decode85() throws IOException {
    long tuple = 0;
    int c, count = 0;
    int i=0;
    for (;;) {
    switch (c = in[i]) {
    default:
    if (c < '!' || c > 'u') {
    System.err.println("%s: bad character in ascii85 region: %#o\n");
    System.exit(1);
    tuple += (c - '!') * pow85[count++];
    if (count == 5) {
    wput(tuple, 4);
    count = 0;
    tuple = 0;
    break;
    case 'z':
    if (count != 0) {
    System.err.println("%s: z inside ascii85 5-tuple\n");
    System.exit(1);
    putchar(0);
    putchar(0);
    putchar(0);
    putchar(0);
    break;
    case '~':
    if (in[i+1] == '>') {
    if (count > 0) {
    count--;
    tuple += pow85[count];
    wput(tuple, count);
    c = in[i+1];
    return;
    System.err.println("%s: ~ without > in ascii85 section\n");
    System.exit(1);
    case '\n':
    case '\r':
    case '\t':
    case ' ':
    case '\0':
    case '\f':
    case '\b':
    case 0177:
    break;
    i++;
    private void putchar(int l) throws IOException {
    //System.out.write(l);
    fos.write(l);
    Does anybody have an idea to solve the problem?

    A scan from string function can be used as per the attachment. The double \ ('\\') is in the format string in order to interpret the '\' characters in your input string as literal characters and not special formatting.
    Message Edited by Dennis Knutson on 07-10-2006 03:53 PM
    Attachments:
    Scan From String.JPG ‏5 KB

  • Extract images from PDF out of Illustrator with script

    Looking for a script to extract images from a pdf opened in Illustrator.
    I need the images to extract separately to a folder. Jpeg perhaps.

    hi
    I have to do the same... I have to convert a pdf to an image format.... can you solved the problem??? Can you help me??
    Thanks in advance...

  • Extract image and Features from the Catalog via JAVA API

    Hello,
    I would like to Extract image from the Catalog via JAVA API, Can anybody help on that? I also tried to extract the Features field form the Catalog but results in the error "Features field not found" Any ideas what could have wrong?
    Many thanks,
    Dharmi

    Hello,
    Can anybody tell me where i can find the latest JAVAAPI reference guide? I found the one for MDM 5.5 SP 1 but that also refers to the last parameter of the CatalogCache.Init as int and not string.
    I looked up in service.sap.com/instguides -> SAP Netweaver -> Release 4 -> Installation and there only following 3 files are there for MDM 5.5 SP2
    MDM 5.5 SP02 - Configuration Guide  SAP MDM
    MDM 5.5 SP02 - Installation Guide   SAP MDM
    MDM 5.5 SP02 – ERP-MDM Field Mapping and Check Tables
    Regards,
    Dharmi
    Message was edited by: Dharmi Tanna
    Message was edited by: Dharmi Tanna

  • Does any body knows how to detect the resolution of an inserted image in word?

    Does any body knows how to detect the resolution of an inserted image in word?

    I'm not sure I understand your reply but when you insert images into Word that are larger than will fit on the page, they are automatically compressed. If you right click and select Size and Position you can restore it to 100% but you will not see all the image.
    See www.grainge.org for RoboHelp and Authoring tips
    @petergrainge

  • Extracting images from pdf

    I am trying to extract images from pdfs using pdfimages, but i am unable to retrieve all the images. By opening the pdfs using Acrobat Reader 9.0, I am able to select, those images retrieved by pdfimages, using the select tool but for other figures/images we need to try other options like print screen and then cut the relevant image. I was wondering why or when does the Acrobat treats the figures/images differently.

    Hi Dave,
    Thanks for the reply. My question was not regarding any non-Adobe product like pdfimages. It was in general the way Acrobat handles the images while creating pdfs.
    I wanted to know why can we select some of the images from the pdf using select tool and can not select others for which we need to print screen and cut. Is there anything in the eps files of included image that causes such effect?
    Thanks.

  • Can I extract images from PDFs using Batch Processing as I have many separate PDFs all with images t

    I have about 500 separate PDF pages all that need their images extracting, surely there must be a way to run a batch command on it?
    PLease help! it will take me for ever!

    Advanced>Batch Processing...
    Click the "New Sequence" button
    Name the sequence (i.e. Extract Images)
    Click the "Select Commands..." button
    Select one of the following items:
    - Export All Images As JPEG,
    - Export All Images As JPEG2000,
    - Export All Images As PNG,
    - Export All Images As TIFF
    Click the "Add" button
    Click the "OK" button
    Select your preference in the "Run commands on:" pop-up menu
    Select your preference in the "Select output location:" pop-up menu
    Click the "Output Options..." button
    In the "Output Options" dialog box, make your preference selections.
    Click OK
    Click OK
    Click the "Run Sequence" button.
    Sabian

  • Extracting Images from PDF file

    Hello All,
                   I am reading PDF File.I need to extract images from PDF File programatically.But problem is that some images are stored inside PDF File using FlateDecode Filter and I need to first decode that file and then I can extract that image .I dont know the way to decode that image data.Is there any way or API to do that in C++.
    Thanks
    Aarti Nagpal

    I think you can do it through cos object in VC++ plugin..go through the PDEFilterSpec in
    Acrobat core api reference
    Be well..

  • Extracting Image Links From HTML

    Hi, at the moment am trying to extract image locations from the html files i.e. the "src" part of <img> tag. Have tried using HTMLParserCallback as I know this allow you to extract links from a page. However when I open a document I know to have <img> tags in it, handleStartTag() is not called, I only get all the other tags. Any idea how to solve this problem? Thanks very much,
    Ross

    Hi,
    Here's a portion of a class I wrote a while back....
    Note the useMap variable I introduced in the constructor.
    Some HTML files had the images in a "map" attribute, others in an "imgmap" attribute.
    regards,
    Owen
    private class CallbackHandler extends HTMLEditorKit.ParserCallback
        private HTML.Tag tag;       // The most recently parsed tag
        private HTML.Tag lastTag;   // The last tag encountered
        private int nested = 0;
        private boolean useMap;
        public CallbackHandler(boolean useMap)
            super();
            this.useMap = useMap;
        public void handleText ( char[] data, int pos )
        public void handleStartTag ( HTML.Tag t, MutableAttributeSet attSet, int pos )
        public void handleSimpleTag ( HTML.Tag t, MutableAttributeSet attSet, int pos )
            if ( t.toString().equalsIgnoreCase ( "input" ) )
                boolean imagemap = false;
                String name = null;
                String src  = null;
                if ( attSet instanceof SimpleAttributeSet )
                    SimpleAttributeSet sattSet = (SimpleAttributeSet)attSet;
                    Enumeration atts = sattSet.getAttributeNames();
                    Object att;
                    while ( atts.hasMoreElements() )
                        att = atts.nextElement();
                        if ( att.toString().equalsIgnoreCase ( "name" ) )
                            name = (String)sattSet.getAttribute ( att );
                            // got the name of the attribute
                            // Note : useMap is a boolean flag for you to set.
                            //        Some HTML pages used "map" attributes, others "imgmap" attributes
                            if ( useMap )
                                if ( name.equalsIgnoreCase ( "map" ) )
                                    imagemap = true;
                            else
                                if ( name.equalsIgnoreCase ( "imgmap" ) )
                                    imagemap = true;
                        if ( att.toString().equalsIgnoreCase ( "src" ) )
                            src = (String)sattSet.getAttribute ( att );
                    if ( imagemap )
                        try
                            imagemapURL = new URL ( src );
                        catch ( MalformedURLException malEx )
                            System.out.println ("Invalid Image Map URL : " + src );
        public void handleEndTag ( HTML.Tag t, int pos )
    }

Maybe you are looking for

  • How do I make hyperlinks within a PDF (loaded on a website) open in a new window?

    I dont want the user to have to click the back button to navigate back to the webpage. When I made the button in the pdf, I used the action: Open a web link. When viewing the file within acrobat it obviously opens the link through my browser, however

  • Set value to system matrix (38) cell

    Hello, Thanks for your reply. I need to set value for some UDF on a system matrix (38). Crurrently ,I use following code: public void SetValue(string columnUniqueId, int matrixRowIndex, string value); (I'm using the coresuite framwork) The corespondi

  • Just a wish for ITunes DJ

    I have used ITunes for years to manage my music.  I think it is a great system, and Apple has even been able to add features I wanted after listening to user feed back.  In hopes Apple may listen to this short post, I would like a change in the funct

  • Locking Table contol

    How to lock a table control upto where data is showing in vertical column . After that new input can be made.

  • Cisco Aironet 3502i AP has no light

    Hi All, I just configured Aironet 3502i and it works ok but no lights. I could see the AP in switch, in WLC, and in WCS and ping ok but when you see it mounted it has no light. I tried to shutdown the port in the switch and I saw it turns green (boot