Filtering an HTML file

I have a requirement to be able to take an HTML file, parse it, and filter it so that I can automatically change the src attributes on image elements. Does anyone have any code to do anything like this in a simple way?

I took a look at Tidy, but couldn't find enough good documentation (read examples) to figure out what to do. Seems like it converts everything to DOM, and then I need to convert this all back into HTML. I think this is a lot of work, any examples?
As for Regex's, is this a safe thing to do? Seems like there are many things that could go wrong. Again, anyone have any simple examples for how to do things this way?
To reiterate my problem, I have an HTML file sitting on a disk. I'd like to convert it so that anything that looks like this:
<IMG height=1 alt="" src="files/blank.gif" width=1 border=0>
gets converted to:
<IMG height=1 alt="" src="myservlet?file=files/blank.gif" width=1 border=0>
Where the actual thing being substituted may depend on what was there before. This is a simplified example of a bigger problem, so a stand alone piece of code in Perl, etc. won't help me. Thanks for any help you can give.
Sander Smith

Similar Messages

  • How can I change the font in html-file

    Hello,
    if I create a new html-file in KM-Content - Folder - New - Html-File, I always get the font Times New Roman. In the editor I have Arial oder something similar - but not in the view.
    Does anybody know how to change this font to Arial? Where do I have to configure what? Thanks
    Regards,
    Susanne

    Hu Susanne,
    Detlef
    "Say Detlef once again" (Detle
    v) 
    sorry that I´m no mathematician or a SAP EP-consultant
    Never mind!
    So "dumb" is maybe the wrong word
    You didn't realize the ";-)" behind it?!
    It's just that this strict "it doesn´t work like you wrote" was a bit irritating, in fact it works, it's tested on the same version as you are on, and it's also not a very new feature (new features often are buggy at some ends).
    But now I had a second look on the link I have given - and sorry, even if it should help, in fact it was the EP50-version. The NW04 version, which seems you have already detected by yourself, is http://help.sap.com/saphelp_nw04/helpdata/en/a8/923717bc40dd49b1e296b9877e98f0/frameset.htm
    After having seen this, I also found this ominous description of "HTML Filter for Portal Stylesheets"http://help.sap.com/saphelp_nw04/helpdata/en/cc/b8f941db015f24e10000000a1550b0/frameset.htm
    Yes, this does in fact not exist on EP6 SP9. You could open an OSS message and ask to delete the description of a non-existing feature or for delivery of it.
    Anyhow, it shouldn't do more than the HTML Stylesheet Filters are already doing - maybe it would choose the corresponding CSS by itself?
    Is there anything else I have to configure anywhere?
    The prerequisity is "The publishing pipeline is configured appropriately.", but this should be the case by default.
    The repository used is "documents" and the html file you have been using is stored in this repository?
    The stylesheet exists on the location set within the stylesheet configuration?
    There are not sooo many screws to turn...?!
    Hope it helps
    Detlev

  • View metadata near his image in a html file?

    Hi
    I am new to this board,
    I wanted to ask if there is a way to publish the metadata of an image, (or filtered metadata), into a html file, (for expample near the image).
    It want to publish photos with some of the metadata of it made readable on an easy way...
    Thank you in advance for looking at my problem
    Erwin

    Within Bridge you can select a batch of images and convert them to a Photoshop Web Photo Gallery.
    You can customize the page templates that Photoshop uses. I've not seen any documentation for metadata but some of the templates display caption, copyright, etc. Try searching or asking on the Photoshop forum.

  • Assigning a String Value to the value attribute of html:file

    Hi,
    we facing a problem while assigning a value to the VALUE attribute of file tag
    <html:file property="fileupload" size='25' value="sample.xls" onchange="callsheet()" />
    can anyone help me out plzzz,

    Two points -
    1) Java is not Javascript or HTML; you would do better to find a more appropriate forum.
    2) When you take this to a better forum, you need to describe the problem.

  • How can I edit an html file in Firefox using View Source?

    I have created an html webpage and saved it to my pc. I can't edit the html file in firefox when I right click on the page and select "View Source". I can view the code but I can't edit it. I can do this in IE9.
    I thought I was able to do this before in Firefox.
    Did my settings change somewhere that I can no longer do this?
    Do I have to have something checked in the Web Developer extensions for it to work?
    I'm using Firefox 24.0
    Thank you for any help with this. :)

    The View Source window in Firefox cannot be used to change computer codes. Nor can the Web Developer tools.
    Sorry for any inconvenience.
    I suggest using Programmer's Notepad.

  • Relative path for HTML file

    Hi ,
    This is my requirement. In the ESS Personal information overview page, I can add a HTML file to the additional information section.
    I have created a HTML file by creating a web module and enterprise application and deployed the same through NWDI.
    I declare a resource wher ei mention the path. Here i specifed the complete path of the html file ... like
    http://hostname:portname/addinfo/addinfo.html.  Which i know is a wrong thing to do since when i move the html filel to the production system it will still be accessing the html file in the dev system.
    How do i give the relative path here so that the html file accessed is from the production system when we  go-live.
    Please suggest.
    regards
    sam

    Hi,
    One simpler solution is to use an AppIntegrator iView, as explained in the weblog <a href="https://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/2786">Integrating your Web Front-ends into the SAP Enterprise Portal using the Application Integrator</a>. It worked well for us.
    You can remove the user authentication part in the portalapp.xml if you don't need it.
    Regards,
    Martin

  • How to generate & publish  webhelp html files in different folder than default folder

    Hi All,
    I am new to robo help. I have a requirement to generate & publish webhelp html files in a different folder than default folder.
    Could anyone please tell me the steps to change the folder
    Thanks
    Rashmi

    You change the generate folder and filename in the first field on the first page of the wizard. It must be a folder on your hard disk.
    You change the publish folder in the last page of the wizard. Anywhere you like.
    See www.grainge.org for RoboHelp and Authoring tips
    @petergrainge

  • Importing/adding html files with bookmarks to project/TOC

    Overview:
    I've been importing some old html files that have a massive number of bookmarks/anchors in them (few hundred html 'pages' some of them mini help document libraries) in an effort to move the manuals into RH format. Using RH11
    Problem:
    when I import them all the bookmarks are showing up under each topic page. Second, when I add a folder into the TOC or auto generate a TOC it pulls each bookmark as a page (see screen shot). Any way I can get around this issue? I can manually add (drag/drop) the 'updates' page here into the TOC and it doesn't pull the bookmarks as pages. However i'm interested in knowing if I can simply not import the bookmarks or exclude them from populating as pages in the TOC. This seems to be new to RH11 (i've used RH9 without this issue in the past, on a similar project).
    System;
    RH11
    Office 2013
    SharePoint 2010
    Thanks

    Hi there
    Note that when you choose to auto-generate a TOC, you have a choice as to whether to create links to bookmarks. Look at the dialog below:
    My guess is that in the older version this option wasn't enabled when you auto created a TOC and you really didn't pay attention to the way it was set in the new version.
    Cheers... Rick

  • Help needed  while exporting crystal reports to HTML file format using java

    Help needed  while exporting crystal reports to HTML file format using java api(not using crystalviewer).i want to download the
    html file of the report
    thanks

    the ReportExportFormat class does not have HTML format, it has got to be XML. Export to HTML is available from CR Designer only.
    Edited by: Aasavari Bhave on Jan 24, 2012 11:37 AM

  • HTML file is not being shown properly in the JEditorPane

    Hi,
    I am using JEditorPane to display an HTML file from the local disk. This HTML file contains the html tables. Now when this file is getting displayed in the JEditorPane, one top row grid is not being displayed in the editor pane. content of the row is there...but the column grid is missing. All other rows and columns are being shown but the first row-column grid which contains the heading for column.
    Also when I m printing the content of this JEditorPane using Java Print API then no grid is being printed on the paper. content is coming properly but no table grids. when i have taken the print out of the original html file from the browser then table grids are being printed out properly.
    Please do help me out in showing the HTML file in the JEditorPane properly and printing the same.
    Many Thanks,
    gshankar

    Hi,
    JEditorPane renders HTML with many limitations.
    You can use JDIC for the same. refer: jdic.dev.java.net
    But JDIC does not work on windows 98.
    Anand

  • Getting links and its names from a html file

    Hi everyone
    My problem about the a getting links with name from a html file. For example
    &#304;n a web page in this site ?SUN? when use click SUN the browser open http://java.sun.com
    &#304; want both of them, so the links and name. I can succeeded the get link but i don t know how to get the link name.
    For example :
    <B>setRightComponent(Component)</B>
    &#304;n this code segment i want to get B tag. But how i don t know. To get A tag i used this code
    List result = new ArrayList();
    try {
    // Create a reader on the HTML content
    URL url = new URI(uriStr).toURL();
    URLConnection conn = url.openConnection();
    Reader rd = new InputStreamReader(conn.getInputStream());
    // Parse the HTML
    EditorKit kit = new HTMLEditorKit();
    HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
    kit.read(rd, doc, 0);
    // Find all the A elements in the HTML document
    HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
    while (it.isValid()) {
    SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
    String link = (String)s.getAttribute(HTML.Attribute.HREF);
    if (link != null) {
    result.add(link);
    it.next();
    &#304; can use B tag but i don t know hot to get its value because it has no prefix such as HREF....
    i am sorry if i use a bad explanation style or incorrect word.

    import java.io.*;
    import java.net.*;
    import javax.swing.text.*;
    import javax.swing.text.html.*;
    class GetLinks
        public static void main(String[] args)
            throws Exception
            // Create a reader on the HTML content
            Reader reader = getReader( args[0] );
            // Parse the HTML
            EditorKit kit = new HTMLEditorKit();
            HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
            doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
            kit.read(reader, doc, 0);
            // Find all the A elements in the HTML document
            HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
            while (it.isValid())
                SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
                String href = (String)s.getAttribute(HTML.Attribute.HREF);
                int start = it.getStartOffset();
                int end = it.getEndOffset();
                String text = doc.getText(start, end - start);
                System.out.println( href + " : " + text );
                it.next();
        // If 'uri' begins with "http:" treat as a URL,
        // otherwise, treat as a local file.
        static Reader getReader(String uri)
            throws IOException
            // Retrieve from Internet.
            if (uri.startsWith("http:"))
                URLConnection conn = new URL(uri).openConnection();
                return new InputStreamReader(conn.getInputStream());
            // Retrieve from file.
            else
                return new FileReader(uri);
    }

  • How Do I Use the Help Tag/Help Path in LabVIEW to Link to a Specific tag in an HTML File?

    Is there any way to point user to a tag in an HTML file when he click "Click here for more help" ?
    Message Edited by zou on 03-08-2007 02:38 PM
    George Zou
    http://webspace.webring.com/people/og/gtoolbox
    Attachments:
    a.png ‏18 KB

    George,
    I believe you are correct in saying that there is no way to link directly to a specific anchor tag within an html file from the context help.
    I would encourage you to visit our Product Suggestion Center if this is a feature you would like to recommend that our R&D team consider for future versions of LabVIEW.
    Is it possible for you to create a .chm file?  Or perhaps you could have some kind of "table of contents" at the top of your .html help file.  This would require an extra click by the user but may be an option for you.
    Regards,
    Simon H
    Applications Engineer
    National Instruments
    http://www.ni.com/support/

  • Multiple HTML-files in one stack / link another HTML-file in a stack

    Hi,
    I want to integrate a webapp (html/js) into a stack of a dps-folio. All the HTML works fine ... all no problem, but at clicking the first link on the initial HTML-document the next (local) HTML-File is opened in a slider with a "ready"/"Fertig"-bar at the top. But I want, that all html files are seamlessly loaded in the same view.
    At this point i found: http://help.adobe.com/en_US/digitalpubsuite/using/WS67cb9e293e2f1f60-8ad81e812b10bfd837-80 00.html and i made the file "NativeOverlays.config" and moved it to the described location. After that, the behaviour was correct in the desktop-preview-app of the DPS, but it still fails on the iPad-Adobe-Viewer.
    How can i link between HTML-files without any interuption or sliding?
    Thank you!
    Christian

    Hi Tim,
    Thanks for replying. I have looked for "PDFBookBinder class" in xml publisher user guide for ver 5.6.2. I didn't get any reference of this text. Can you please guide me to a tutorial/link where i can get more information about this class.
    Also, i originally thought of similar to your second logic, as my design basis. Oracle process generates the xml file in output directory which i can get. What i didn't get is how do i "pick them up and merge" using publisher. Also, is there way to do this merging process using pl/sql ? Can you please give little more information on your second approach.
    My original plan of action is that i will create a report set in which i will call oracle seeded report for all 7 payrolls in a sequential manner. Then using the child requests of the report set i will get to 7 xml files generated by seeded oracle process. Then the piece i am not sure of , i will use those 7 files to generate a single xml file having payroll name as tree top for each output. Once single xml is ready, i can easily design a template and register the process to generate output as Excel.This process will not require me to actually change any data or do any calculation. It will only reformatting the feilds we see and abiity to see all 7 payroll at one time rather then entering these numbers manually into an excel to do analysis.
    Please provide your feedback, if you think above plan is not feasible or need corrections.
    Best Regards,
    Ankur

  • Elements do not show up in edge file but they do in HTML file; no errors occured

    I am a senior in college working on building a website for a fictional company for a year-long thesis graphic design project. In my class last semester, I learned how to use Edge Animate and created an 11 page website without any trouble. I had always kept the files saved to my flash drive, and kept a backup on my personal laptop as well as on my external hard drive. There was also a backup of the files on a disc that I turned into my professor in mid-December. All of these files worked fine at the time.
    In early January, I wanted to make changes to the website so I installed a trial version of Edge Animate on my laptop. When I opened any of the edge files from any of the backups, the stage was blank and the animations did not show up on the timeline. The elements of the website still show up in the Library but they are no longer arranged on the stage. When I got back to school at the end of January, I tried opening the files on the school computers and had the same problem. I asked my professor to try the file that was saved to the disc and he encountered the same issues on his computer. All of these computers are either iMacs or Macbook Pros.
    I have not recieved any error messages and did not do anything to corrupt the files or save over them. I am able to open the html files in a browser with no issues; all animations and images work fine. The only change I can think of is that Edge may have been updated over break and when I installed the trial version I installed a newer version, and for some reason the new version of the program could not properly load my project that I started on a older version in October or November of 2013.
    I am happy to upload the files but I am unsure how to do that. Please let me know if there is a way to resolve this issue, or will I have to start over?
    Thank you!

    Try to clear your preference and restart your Animate.See if that fixes your problem
    http://helpx.adobe.com/edge-animate/kb/restore-preferences-edge-animate.html
    If  that does not work see if there is a lloopback address lookup.
    Check out the correct answer in this post to fix that http://forums.adobe.com/message/6116991
    Let us know if this fixed your issue

  • My imported backup html files from another computer do not show up after import

    I am currently transferring files from my macbook pro to a pc netbook. I already exported my firefox bookmarks via an external drive and accessed the html file on my new netbook. After importing the html on firefox on my new netbook, the bookmarks do not show up on my bookmark list.

    Please see this article on how to move your bookmarks and settings from one computer to another: [[Moving your Firefox bookmarks and settings]].
    Caution: Restoring bookmarks from a backup will overwrite your current set of bookmarks with the ones in the backup file.

Maybe you are looking for

  • Dbms_java in oracle 10.2.0.4

    Hi, How to install and unistall dbms_java package in oracle 10g (10.2.0.4) Thanks.

  • Problem with social apps.

    yes mee too . i cant open my social applications , and i download facebook but it closing down. i try to reset all. but nothing happens. Solved! Go to Solution.

  • CD for car MP3

    I realized that at least half of my music library is saved as MP3 and the other half (that imported from my old CDs collection) is M4a. When I burn the CDs in order to listen in the car unfourtunately the m4a is not recognized/played by the car. I tr

  • Unused paragraphs and character styles in the catalogs?

    How could I delete automatically the unused paragraphs and character styles from the catalogs?

  • Iweb to Ipage publishing - new question

    Hi!  I've read all the other posts about publishing directly from iweb9 to ipage.  I'm am able to connect and my iweb folder (called lemons) uploads to /  .  I can verify this with cyberduck (which I don't want to use for publishing).  But when I go