Clean up HTML source

The company that i work for has recently acquired a new
product and I have been assigned to work on the documentation. The
legacy docs that we have received ar so big that I've never come
across such big help files. For example, there's one project that
has about 1400 topics and it's total size is 265 mb. Thetre are at
least 10 more projjects.
I opened one topic and realized that the people who worked on
it before me embedded the style info in the HTM code.Before i add
the new features etc. I would like to trim the file sizes. Cleaning
up each file manually is going to take a lifetime, can you guys
suggest some automated tasks that will make my task a bit easier?
Much Appreciate.

Hi dsind and welcome to the RH community.
Your task of cleaning up the code could be difficult and time
consuming. You may be able to use a find and replace tool (
FAR is my own personal
favourite) but the problem you will have in the syntax of the
styles used in the code. Is there a pattern to the code used? if
not, the easiest option may be to manually go through each topic.
You could "trap" the majority of topics using a find and replace
tool first and then look at those topics that are trapped.
BTW 1400 topics is not particularly big so don't
worry.

Similar Messages

  • New Safari 2.0.3 displays some Router Firmware pages as HTML Source text

      Hello Safari users and gurus.
    I installed the new 10.4.4 Combo Updater yesterday, preceded and followed by permissions repairs and other maintenance, including cleaning Safari Caches. Safari 2.0.3 works.
    My problem: 10.4.4's new Safari 2.0.3 displays HTML source text after I click "Save" from within my Linksys (latest firmware) router's web based administration pages that worked correctly with the previous version of Safari.
    When I check or change my router settings, the initial router settings pages appear as they did with the previous version of Safari. However, with 10.4.4's new Safari 2.0.3, as soon as I click "Save" to attempt to save a changed router setting, the confirming info from the router is displayed as HTML source code instead of the expected HTML page display.
    If I wait for the router lights to indicate activity has ceased and then reload the router's opening admin address, the page opens again normally, and I can navigate to other pages and see that the changes have been saved. However, any click on any "Save" changes button in the router's pages delivers another page of HTML source text.
    I will watch for future router firmware updates to see whether the issue is resolved from the Linksys side.
    Does anyone have any suggestions for improvement now? All assistance will be appreciated.
    EZ Jim

     Thanks glefand. As you suggested, I tried my old iBook G3 that is still running 10.3.9, and it works fine for me.
    My G5 DP also worked normally up through OS X 10.4.3. My problem only began after I installed the 10.4.4 update. I think the Safari update included with the 10.4.4 update is what is causing the issue with the firmware pages on my Linksys router.
    Hopefully a future router firmware update (or Safari update?) will restore proper operation. If not, I will continue to reload from the router's Start page. The need to work around this glitch is annoying, but the actual function of my router setup works without problem, even though the pages do not display properly.
    Thanks again for your helpful suggestion,
    Jim
    G5 DP 1.8, 4.5G RAM, 2x160GB Seagate, 1,000va UPS   Mac OS X (10.4.4)   20"ACD, iSight, AirportCard, Klipsch GMX A-2.1 Audio

  • PDF outputs continually crash after editing an html source file

    I have English and multiple localized versions of a RoboHelp webhelp project. I have created several PDF outputs for the localized versions. I can generate the PDFs fine when I don't touch any of the html source files.
    However, I had to make a change to one of the html source files to remove a <dl> from the source file. After making the change to the source file in an outside text editor (Notepad ++), I went back to RoboHelp to generate the PDF, and the PDF output crashed.
    The reason I was using an outside text editor is that when I tried to edit the <dl> HTML tags in RH, its HTML editor deleted all text within the tags being edited.  There was no undo at this point.  In anotehr attempt to use the RH Design editor, RH didn’t delete the <dd> or <dt> tags when those tag styles were changed to <h4> and <p>, respectively. RH inserted the new html wrapper inside the existing ones, instead of replacing it.
    Also, during the PDF generation process, a Visual Basic error message will appear dozens of times stating that the application was interrupted.  I must press OK to continue for each error message.
    What gives with the PDF generation failing like this?

    I hate to say it but it sounds like something you are doing in the external editor is not liked by RoboHelp.
    It could just be the way that text editor codes the file. I use EditPad Pro and that has various options for text encoding. My guess is your editor is not set to what RoboHelp expects.
    Aftr that what I would do here is create a new one topic project and recreate the problem, with the minimum of text. It may help you spot the code RoboHelp does not like.
    See www.grainge.org for RoboHelp and Authoring tips
    @petergrainge

  • Java app to save html source and copy to clipboard

    I'm new to Java, and I'm doing a research paper which requires gathering data from the internet. I basically need to learn commands to create an app (which I will execute on a daily basis using Windows scheduler) to do the following task:
    1. Access a specific URL
    2. Save the HTML source as a text file
    3. Copy some of the text in the saved text file to the clipboard in Windows
    4. Paste this data in Excel
    5. Run a macro in Excel to sort the data
    I know enough VBA and Excel functions to parce the data, but if it is easier to use Java to sort the data into columns and rows in Excel, I would like to learn how to do that too.
    I'm running Windows XP. My friend tells me to download Dr. Java to edit .java files and also the SDK to compile the files. He also told me to find modules from the internet which I can use, but I'm not sure how to go about it. (I tried searching on the net but didn't get the hits I wanted.)
    Any help at this point would be much appreciated.

    If your target html is so fancy and complicated(table cell contain another table) or the table miss some <tr> or <td> tags, dividing the table into rows and column can be a difficult task.
    I recommend you choose the website with high download speed. Otherwise, you have to write more code to handle any loading failure.
    I recommend you use httpunit to get webpage with Java program. (There is usually proxy server in University, you can config the parameter for proxy server in the Java program calling httpunit)
    http://httpunit.sourceforge.net/
    It will take some times to learn those Java technique.

  • Help in creating a Java class to convert Html source to XML

    He Everyone!
    I am using selenium as my automation tool
    I got the htmlsource of the page using selenium.
    Now i have to write a Java class which will convert the data (html source)
    (and output a data structure in standard XML format
    Can anyone give me an example codejava class) in how to acheive this please?
    Please mail to
    [email protected]
    Many thanks

    getafix14 wrote:
    Can anyone give me an example codejava class) in how to acheive this please?
    Please mail to
    [email protected]
    Sorry Charlie, but this isn't a "mail me the codez" forum. Either use Google, or try to share your problem and their solutions in the forum. Besides, a little Googling will find you a solution.

  • How to get the html source for these web page ?

    My code work well for standart page, but I'm unable to get the html source from these page with my vb program :
    http://www.slashdot.org
    http://userfriendly.org
    http://segfault.org
    here my code
    private sub commandgethtml_Click ()
    Inet1.Cancel
    Inet1.Protocol = icHTTP
    Inet1.URL = theURL
    HTMLcode = Inet1.OpenURL(theURL, icString)
    RichTextBox1.Text = HTMLcode
    end sub
    thanks in advance.

    Hello Cyrano,
    This Developer Forum focuses on the National Instruments product "Measurement Studio for Visual Basic" (formerly known as ComponentWorks). Our goal is to help people to better integrate this product into their test, measurement, and automation applications. Your question directly pertains to the Microsoft Internet Transfer Control. I think you would find an increased number of responses that are better focused on your question if you would repost it to a forum that specializes in general VB and internet programming. Good luck!
    Jeremiah Cox
    Applications Engineer
    National Instruments
    http://www.ni.com/ask

  • CQ page is not rendering properly. It is not rendering HTML. It is showing HTML source code as is.

    On some of the pages, I am getting this weird behavior wherein page is not rendering properly. It is showing HTML source code as is. Could you please help me out? What could be the issue? And how can we get rid of the same?

    Check your component jsp page. it is possible that it is just plan file without directives <@ or you might have miss to close tag which is creating source as text to render
    Paste your jsp code in case you need further help
    Thanks,
    Ajit

  • I see the license agreement as HTML source code

    Hello,
    I downloaded a trial version of Adobe Acrobat 9.0,
    but after I had installed it and start it, it wants me to agree on the license agreement.
    but my problem is that I can't click on Agree nor Disagree because I see only HTML source code
    see the printscreen I added below.
    does anyone know what is wrong, or wat I have to install to 'use' these HTML pages whitin Adobe Acrobat?
    I also downloaded the software again and installed it again and it happens also with trial versions of photoshop,
    I have not tested other software so far.
    Thanks in advance
    Kees
    PS: I am dutch so my english is not perfect.

    You seem to have missed this thread:
    http://www.adobeforums.com/webx/.3bc424c7/
    I understand Liam's suggestion has worked for many.

  • Characters missing in HTML source

    In ApEx 3.0.0.00.20 on Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod I have problems viewing the HTML pages in frontend. I miss some characters in the HTML source. The contents displayed are loaded from the database where they are saved correctly. When reloading the page, the errors appear randomly in other places in the source.
    Example:
    <b>(Textb></b>
    instead of
    <b>(Text)</b>
    The missing characters are not accpetable if buttons do not appear or processes do not run. The appilcation also runs on another instance without those bugs so this is not an application error.
    What I would like to know is if someone else ever had the same problem and if it is a performance error or if the error is found in the database or in the ApEx installation.
    Thanks in forward for your replies,
    Patrick
    Edited by: Pennypacker on Sep 16, 2009 11:00 AM
    Edited by: Pennypacker on Sep 16, 2009 11:01 AM
    Edited by: Pennypacker on Sep 16, 2009 11:02 AM

    Sorry for my bad example, this might confuse because I cannot add HTML tags here. The problem is not that the text is bold or not. I try to give more examples to make the problem clear.
    Example 1:
    input type="radio" name="f15" value="1" />1
    The < character is missing in front of the line, so no radio button is displayed and the HTML source code is shown as text in the browser.
    Example 2:
    Shown Text: zu helfennd einige Fragen zu beantworten
    Text in Database: zu helfen und einige Fragen zu beantworten
    +2 Characters missing+
    In the example above I meant that also the '<' character was missing and the bold-tag was not closed so that all text was shown bold.
    So the HTML code is not generated correctly by ApEx and randomly 1 or 2 characters are missing which makes the site unuseable. Any ideas?

  • Getting HTML source of current page

    Hi All,
    I generate an HTML report using an HTML DB application. Through the same application I want to get the source of this HTML page/report & send it as an inline HTML attachment.
    Can any body tell me how I can get the HTML source of the current page?
    Thanks,
    Ayush

    Hi
    [Follow this |PCD Location of Page Dynamically;
    BR
    Satish Kumar

  • Html source

    I'm new to apex.. I had written my HTML source code in the html body attribute section that is available in the page.. i had created a link.. this link i want it to appear in the main region(reports region).. but it appears at top of my page while i run this page dynamically.. what should i do about that?

    Hi,
    you have to create regions, position them on the page, and put your HTML source in the Region Source.
    The "html body attribute" must only contains attributes for the <body> tag of the page (for example onload="...").
    I suggest you to read the documentation.
    Yann.

  • How to read the html source code of a webpage.

    How can I read the html source code of a webpage with a java application?
    Is there a good idea?

    >
    How can I read the html source code of a webpage
    with a java application?
    Is there a good idea?
    I don't know if this is a good idea, but it works.
    1) Use a URL to obtain the document's location
    2) Use a URLConnection to open a connection between your computer and the
    document server
    3) Connect to the server
    4) Get the InputStream of said connection
    5) Associate the Input Stream with a Buffered Input Stream
    At this point you can use a loop to read lines from the BufferedInput Stream and append them to a TextArea or other suitable text component.

  • How to get the HTML Source code from the active browser ?

    Hi All,
    I need to get the HTML Source code from the active browser (IE). I tried with the below code, but I am not able to get the Source code all the time, with respect to the different applications (http or https) and the user authentication has to be changes in few applications (_I dont know or not able to given that in the below code_). More over there is also a dependence of the URL to get the HTML Source code.
    Therefore what I feel is getting the HTML Source code from the given or active browser will be consistent than the URL. Since the Source code is available in the browser (IE) . Please help me with a sample code to achieve this . . . !
    HTMLDocument doc=(HTMLDocument) kit.createDefaultDocument();
    doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
    URL url = new URL(strURL);
    Reader HTMLReader = new InputStreamReader(url.openConnection().getInputStream());
    kit.read(HTMLReader, doc, 0);Thanks in advance,
    Regards,
    Jothi Venkatachalam
    Edited by: j0o on May 7, 2009 3:11 AM

    The simple answer is: you don't.
    Not only is it simply not possible, but the entire concept of "the active browser" doesn't exist.
    You were on the right track with your code to retrieve the page directly from the server, but as you noticed that code will only work for regular http connections.
    For https and other protocols you will need to use appropriate libraries for each protocol. Something like Apache Commons can help you with that. There are networking libraries in there for a lot of commonly used protocols.

  • View the html source code of an apex page

    Hi everyone,
    I search to how I can view the html source code of an apex page and to be able to modify it. That's why viewing the html source code from the browser when the application is running doesn't arrange me.
    Has anyone an idea how it can this be possible?
    Best regards,

    Khadija Khalfallah wrote:
    Hi everyone,
    I search to how I can view the html source code of an apex page and to be able to modify it. That's why viewing the html source code from the browser when the application is running doesn't arrange me. What do you mean?
    Do you want to be able to pull up the HTML source generated by Apex, modify that copy, and then feed it back into Apex with the chagnes you made? If so you can't. Apex generates the HTML through its tools and you have to modify the generation routines to get different HTML.
    Do you merely want to look at the generated HTML? In Internet Explorer all you have to do is right click on the page and choose View Source to open a window with the HTML source in an editor. I sometimes find it useful to save a page and manually edit the copy to immediately see the effects of certain changes to the underlying HTML and/or Javascript without permanantly making the change in Apex.

  • Are we allowed to use the Web developer function in Firefox version 5.0 to edit the html source code associated with the Firefox home page?

    Locking at request of OP - https://support.mozilla.com/en-US/questions/844506
    Are we allowed to use the Web developer function, under the "Firefox" tab in Firefox version 5.0, to edit the html source code associated with the Firefox version 5.0 home page ( so that we can personalize the home page )? Is this legal?
    Sincerely in Christ,
    Russell E. Willis

    Solution: (Free Download Manager)
    Go here: http://codecpack.co/download/Free_Download_Manager.html and download Free Download Manager 3.8.1067 Beta 3, it works perfectly with Firefox 5.0.1
    Solution: (to Google mail aka Gmail)
    I have had this problem for a while since I did a previous Firefox update, where I had to force Gmail to load in Basic HTML else it's next to impossible to use it. The solution is this: simply update your Java, and Gmail will work without a problem using Standard HTML. To update your Java go here: http://www.java.com/en/ and select "Free Java Download".
    And beta normally, universally, means "the not quite there yet version of the version we're aiming for" NORMALLY used during production and testing of a type of software.

Maybe you are looking for

  • HARD DISK IMMINENT

    hello everyone, MY computer  is giving the error during the boot process S.M.A.R.T current disk status sata0. i have to press f1 all the time to continue to the boot process i searched for this problem on internet and got to know that its an early al

  • The import archive file does not contain a metadata archive.

    when i am trying to upload template on webceter portal administrative console i am getting following error , so how to clear it "The import archive file does not contain a metadata archive." stack trace is SiteResourceValidateImportOperation> <doVali

  • Designating approver in workflow

    Hi All, I need to designate a user in workflow to receive approval emails for all storeroom and A3 approvals. Any existing users should be removed from these workflows.What is the tcode and steps to accomplish same? Thanks

  • Aperture 2.1 browser question

    Just installed 2.0 and updated to 2.1. Was a user of 1.5 before this, and liked being able go through my images with the row of thumbnails vertically and the large image showing big to the right of that since it gives me a pretty square area on my mo

  • Does not always disconnect [N85]

    Hi All, Just me... Again... After a few weeks enjoy of my N85 I have notice that it that my device stays connected to the EDGE- or 3G-net. I think it's happens when i have used the function "recording place" (don't know the English name in the menu)