Finding Headings and Link Text in MARS files

Hi,
I've been looking into the MARS package and have been able to
extract some great information for my application to read from the
underlying XML, but I'm stuck on a few key items. Does anyone know
how to reliably and programmatically find the following items
throughout a MARS package?
- Headings
- Hyperlink TEXT (I can get the Rect & Dest from the
/page/#/pg.can file but not the text itself)
- Tables & Graphics (where they start and end in the
svg?)
What I'm looking to do is basically build a big inventory of
all these common objects from a .mars file and report them to the
user. I'm just using DOM to read the XML/SVG files.
Thanks!

Hi,
Sadly, there is probably no easy way to do the things you are
asking. Mars files generated from PDF are reliant of the
funadmental properties of PDF. Below I will discuss each of your
points:
Headings:
Depending on the producing application and the settings used
to convert to PDF, there might be some slightly more intelligent
way to do this. A PDF can be "Tagged", such that it has a logical
structure tree. If your sources are very similar in nature (i.e.
same producer and same settings) and come from a producer that
creates Tagged PDF, then it is possible you can use the logical
structure to locate headings. However, if the PDF does not have
this information, the only indication you have that a given block
of text is a heading is that it is bigger than the rest of the text
and perhaps begins with a number. If the PDF is Tagged, then look
for a struct.xml file in the same location as the page. This
references a marked content section in the SVG, which is a named
group.
Hyperlinks:
Sadly, there is very little that can be done here. PDF
represents links as rectangles on the page. There is no direct
association with the text that the user sees as a link. In HTML
links are described as anchors linking to another location with a
specific piece of text representing them. In PDF, it is just a
rectangular overlay on the page. Any change in color (or an
underline perhaps) is a text property and not part of the link.
This is for historical reasons in PDF and Mars does not currently
try to reconcile these pieces of information.
Tables & Graphics:
Well, tables are hard. There is no "table" operator in PDF,
so tables are created using paths to draw the lines and text
operators to place the text inside the table. Again, if you have a
Tagged PDF, it is quite possible that the table has been marked in
the struct.xml and the content identified, but as above, this is
highly dependent on the nature of the source that was used to
generate the PDF. The same is true for graphics. Graphics created
with paths and other drawing operators do not necessarily get
grouped as such, but a Tagged PDF (and therefore a Tagged Mars
file) might have them.
Conclusion:
So, to summarize the above, there are possibilities and if
you have control over the sources, there is a chance that you can
easily obtain the information you require. However, if you have a
number of disparate sources, then I suspect there is no easy
solution for you.
If you have any more specific questions or more details of
the nature of your sources, perhaps there might be something you
can look for. Also, let me know if you need more information
regarding our logical structure implementation in Mars. As always
with PDF, you are restricted by the quality of your sources and
third party PDF creation tools do not necessarily add all the
information that you would require for what you need.
Matthew

Similar Messages

  • Can't access the "Apple" icon in upper main menu to shutdown my MacBook Pro running 10.6.8!  Also can't open a new finder window and main menus such as File and Edit are sluggish to open or don't open at all.  Doesn't happen each time I attempt to shutdow

    Can't access the "Apple" icon in upper main menu to shutdown my MacBook Pro running 10.6.8!  Also can't open a new finder window and main menus such as File and Edit are sluggish to open or don't open at all.  Doesn't happen each time I attempt to shutdowCan't

    There are some keyboard commands (shortcuts) you can use instead of having to go to the Apple menu -
    Control-Eject          This brings up the Restart-Sleep-Cancel-Shutdown window.
    Command-Option-Eject          This puts the machine to sleep.
    Command-Control-Eject          This closes all apps and restarts the machine.
    Command-Option-Control-Eject          This closes all apps and shuts the machine down.

  • I have a friend who I was helping out with his iPhone. He has since lost it. I believe I backed it up to my machine but have no idea where to find it. Is it possible to find it and read the back-up file without the original device being connected?

    I have a friend who I was helping out with his iPhone. He has since lost it. I believe I backed it up to my machine but have no idea where to find it. Is it possible to find it and read the back-up file without the original device being connected?

    If he gets another device - you should be able to restore it from your computer to the point where you backed it up
    Otherwise that file cannot be read

  • Can I create a custom table of contents and link to other .pdf files based on responses to a form?

    Hey Everyone! First post ever, so bear with me:
    I'm trying to create a streamlined method to use a form  to let myself and others add information and select certain options to put together a custom table of contents. Basically, I would like to have a form with a series of text fill and single/multiple choice options that will automatically populate a table of contents based on the selections and will link to other .pdf files that are associated with the selections. I was hoping this would be possible with a form, but I'm relatively new to the function of the software as a whole and my research came up short. Any suggestions on how to start are more than welcome, and if I wasn't quite clear enough I would be happy to elaborate.
    Thanks for your time!

    You would need to search for other PDF creation software that can accomplish what you desire.
    There are many cheaper  PDF creation alternatives other than Adobe's Acrobat Pro software.
    Also, try doing a web search under these terms to see if you can find an app/software/solution that may work for you.
    How to create table of contents in PDF files

  • How can i get only the headings and their subheadings from word file in C#

    I want to get all the headings along their sub-headings separately from a word file programmatically Using c# for example i have following content :
    HEADING 1 XYZ
    heading 2
    heading 3
    HEADING 1 ABC
    HEADING 1 DEF
    heading 2 lorem ispum
    so my code should return me:
    Heading 1 XYZ
    heading 2
    heading 3
    seperately and similarly remaining headings and subheadings also.
    Bilal Amjad Microsoft Certified Professional

    Did you try to use OpenXml SDK to access word *.docx file in .Net?
    Use OpenXml SDK you could easily to get anything in Word (Office) documents.
    Bob Bao
    Do you still use the same Windows 8 LockScreen always? Download Chameleon Win8 App quickly, that changes your LockScreen constantly.
    你是否还在看着一成不变的Windows 8锁屏而烦恼,赶紧下载这个
    百变锁屏
    应用,让你的锁屏不断地变化起来。
     i am using the following code to get the output but it returns me all he headings and subheading together not separately 
     foreach (Microsoft.Office.Interop.Word.Paragraph paragraph in oMyDoc.Paragraphs )
                        Microsoft.Office.Interop.Word.Style style = paragraph.get_Style() as Microsoft.Office.Interop.Word.Style;
                        string styleName = style.NameLocal;
                        string text = paragraph.Range.Text;
                        if (styleName == "Title")
                            title = text.ToString();
                        else if (styleName == "Subtitle")
                            st = text.ToString() + "\n";
                        else if (styleName=="Heading 1")
                            heading1[h1c] = text.ToString()+"\n";
    Bilal Amjad Microsoft Certified Professional

  • Table headings and search text in OVS

    Hi
    Is there a way by which the column headings in the results table and the text for the search criteria can be changed in the OVS.
    Regrads
    Pran

    In the Custom controller for OVS,
    probably in the init() method please do the coding similar to below..
    ISimpleTypeModifiable stype =
    wdContext.node<model>().getgetNodeInfo().getAttribute("<ModelAttribute>").getModifiableSimpleType();
    stype.setFieldLabel("<New Field Label");
    stype.setColumnLabel("<New Column Label>");
    Cheers,
    Sam Mathew

  • Webutil and arabic text in local files

    We have deployed the webutil successfully at our site. The arabic works properly in the forms. However when we try to use the webutil functionality to generate text/word/excel files the arabic text comes as garbled. Any help in this regards will be appreciated. Thanks in advance.

    Dear Mr. Vanayak
    I am also interested to Save information in URDU or ARABIC in forms but could not succeded. I change my NLS_LANG = .UTF8 but did not work.
    What the steps i have to do for Inserting, Saving and Retriving Data in Arabic or Urdu Language.
    Please help me.
    Thanking you in Advance
    Sayeed

  • Find and Replace text in several files

    Can automator do that?
    I made a website in iweb and as the ones of you that had already tried it must know, it is impossible to change the font format on the links, you have to hack the code manually, changing the .css files.
    I think a trick would be to get automator to replace the code automatically, to lock for a string of text inside a group of files and save it. I just don't know how to do it. Can anybody help?
    The most I have been able to do is to get automator to open all .css files within a folder, inlcuidng all subfolders and open them on textedit.

    I doubt anyone is going to download and look at your code. Please remember that anyone who helps you here is a volunteer, and so the onus is on you to make helping you as easy as possible to do. That means you must take the effort to pare your code down to the bare minimum that shows your problem and compiles, and then post this code here. If you do decide to do this, please use code tags by highlighting your code after pasting it, and then pressing the Code button just above the editor window.
    Good luck.

  • How do I find and replace text in PHP files?

    How can I in CS3 make sitewide changes to the text in PHP pages without changing variable names etc that have the same name?
    For example if I have an installation of a PHP forum and I want to change every instance of the word 'forum' to 'message board'...
    If I used the 'inside tag' search with " as the tag, then if "" contained a variable called 'forum' it would also be changed and therefore corrupt the code....
    Is there a simple way around this?
    Thanks!
    I'm using CS3 on Windows Vista.

    It looks like you're trying to find and replace source code, so you may be able to look at the various places that are looked at when finding and uncheck the ones that don't apply.
    But, if it's all source code then that won't help.  One thing that may work is to expand the search option - for example if the work "forum" that you're wanting to change it preceded by another word, or character or something that sets it apart, then do you find on that. You can expand that search phrase as far out in either direction that you need to to make it different, if of course that is practical in your situation.
    The only other way I can think of is to somehow create an exception rule, but I'm not sure if that's possible or how to do it.

  • DW 8.02 and link target (Point to File) tool

    After upgrading to IE 7 and DW 8.02, I have experienced
    problems with the target tool for links, in the properties bar, not
    functioning. (Please note: I am referring to the bullseye that is
    dragged to items in the files panel in order to set the link
    destination, not the drop-down labeled target for use with frames.)
    When I drag the bullseye to point to a file while within a
    link or while having text highlighted, either the previous link in
    the code has its href changed or there is no change at all. This is
    happening when I'm in code view. Because this is a ColdFusion page,
    the Design view does not display some of the dynamic content of the
    page for it to be edited. (That's not the problem I'm experiencing,
    just the reason I can't use Design view to solve the problem.) It
    appears that in the tag tree above the results/properties panel,
    it's not recognizing that the cursor is within an <a> tag,
    which is probably why the bullseye-dragging isn't working.
    I'm using Windows XP SP2.
    Thoughts? Suggestions? I can code this manually while I work
    on the solution, but would be nice to be able to use the bullseye
    tool again. It's probably one of my favorite parts of the
    Dreamweaver interface.
    Of course the best solution would be to convince the higher
    powers that this means we should upgrade to CS3. :) But until
    then....

    You don't really need to insert any of them. Just click
    through....
    Murray --- ICQ 71997575
    Adobe Community Expert
    (If you *MUST* email me, don't LAUGH when you do so!)
    ==================
    http://www.dreamweavermx-templates.com
    - Template Triage!
    http://www.projectseven.com/go
    - DW FAQs, Tutorials & Resources
    http://www.dwfaq.com - DW FAQs,
    Tutorials & Resources
    http://www.macromedia.com/support/search/
    - Macromedia (MM) Technotes
    ==================
    "Prajnaparamita" <[email protected]> wrote
    in message
    news:e3t4vg$33g$[email protected]..
    > Now when I try to insert an object as a .swf file DW8.02
    asks me to define
    > the
    > tag Object with TITLE, ACCES KEY and TAB INDEX.
    > I just don't know which parameter i have to insert.
    > Can anybody help me??????
    >
    >

  • How to copy Report(including text symbols and Selection text) in Local file

    Hi,
    I have developed a Report in my testing system (Company dummy system) which contains lot selection text and Text symbols.
    Is there any program that will copy that prohram along with all selection text and text symbold and save it on the floder and form the same program or different program i can upload program??
    for example SAP has standard program <b>RSTXSCRP</b> from which i can export and import SAP Script to and from local file or system.
    Please let me know at the earliest.
    Regards,
    Prasanna

    HI,
    For scripts only that option is avaible in SAP.
    For reports u can copy that report to another name otherwise release the request no and transport to one system to another system.
    Final option is u have to save to one local file (green color icon) after that in which system, u want to upload the report  means u have to create new report and upload it including selection-text itself.
    Thanks,
    Shankar

  • Creating a drop down list and linking it to Excel files

    How to create a drop down list and when i select any row in the dropdown list, it must open designated MS- Excel file? Can anyone help me in this with an example code?
    Thanks
    Anu

    Try this in 7.0 format.
    Attachments:
    Listbox for excel files.vi ‏15 KB

  • Urgently need EwsJavaApi example to read email from exchange, and print text attachment to file.

    Hello,
    I have the below code running using the EwsJavaApi. I need to modify it to read the the first email that matches the search string, and then save it's attachment to a file.... Thank you in advance!!!
    public class EWSDemo {
    * @param args
    public static void main(String[] args)  throws Exception {
    ExchangeService service = new ExchangeService();
     // Provide Crendentials
     ExchangeCredentials credentials = new WebCredentials("[email protected]", "yubbybuddy");
     service.setCredentials(credentials);
     // Set Exchange WebSevice URL
     service.setUrl(new URI("https://outlook.office365.com/EWS/exchange.asmx"));
     // Get five items from mail box
     ItemView view = new ItemView(5);
     // Search Inbox
     FindItemsResults<Item> findResults = service.findItems(WellKnownFolderName.Inbox,"Subject:ROGERDODGER123", view);
     // iterate thru items
     for (Item item : findResults.getItems()) {
      System.out.println(item.getSubject());
      AttachmentCollection myAttachments = item.getAttachments();
      System.out.println(myAttachments.);

    >> I need to modify it to read the the first email that matches the search string
    By first do you mean then newest email to arrive or the oldest email in the Mailbox that matches that string ?
    Exchange returns Items from Newest to oldest by default if you want the older messages then you can use the OrderBy property
    http://msdn.microsoft.com/en-us/library/microsoft.exchange.webservices.data.itemview.orderby(v=exchg.80).aspx which should give you the oldest item
    Cheers
    Glen

  • How do I edit and add text to a file on my android

    Oil need to fill a form that was sent to me . I also want to be able to make a pdf first

    Hi timsdream:
    I don't think this is a Firefox question. I don't think you can edit PDFs in Android from a browser if that is what you are asking or are you asking about text editing on Android?
    Anyhow if it is a Firefox question please let us know the usual troubleshooting stuff:
    * what version of Firefox
    * what version of Android
    * what you did, what you expected, what happened
    * any other details that might help!
    ...Roland

  • How to get screen reader to read headings and text in accessible form

    Hi
    We're trying to build an accessible form using Designer 7.0.
    When the user tabs to a text field, radio button, combo, etc, the reader (in this case, Windows XP default "Narrator") correctly reads out the widget's caption and current value.
    However, it skips any of the text between widgets, such as headings and explanatory text - even though these do actually have a valid tab order within the document.
    Is there any way to tell Adobe Reader to either:
    a) Tab to "text" fields as well as to data entry widgets
    b) Read out surrounding text when you tab to a data entry widget.
    c) Anything else that will help end-users to understand the "context" in which they have to enter a piece of data.
    Many thanks,
    Howard

    I played with this a bit to see what could be done. I think this will become more of an issue of whether there is functionality in Narator that will let you do this.
    Firstly, it is not possible to have Reader tab to a static text field. Regardless of whether it's defined in the tab order it won't happen.
    In Narator I wasn't able to get it to read off static text. Using JAWS it had the functionality that you can use the arrows keys to read off text lines around the current cursor position by pressing up and down. So by doing that it did read the static text.
    Only way I could see to get it to happen in Narator was to use a read only text field. But it wasn't great, since when Narator reads it out it explicitly specifies that it's a readonly text field instead of just reading the text.
    Chris
    Adobe Enterprise Developer Support

Maybe you are looking for