ClearScan - buggy OCR - arbitrary spaces in words

I've got 400 pages from a document, scanned @ 600 dpi color.With Acrobat 9 I converted them to 1 PDF, internally all pages stored as JPG2000, high quality. File size ~ 350 MB.
If I do OCR on this file (Acrobat 9 or 10) without ClearScan, everything is ok.
If I do OCR on this file (Acrobat 9 or 10) with ClearScan, the text looks perfect. But: If I copy and paste text (to the Windows Editor), most words contain spaces, I get "Abcd ef g hijklmnop" instead of "Abcdefghijklmnop". Of course, that also means, I can not find "Abcdefghijklmnop", if I try to search for it. Hence, over 90% of the OCR result is useless - because not searchable.
Now, for test purposes I exportet 1 page (from the original 350 MB non-OCR'd PDF) to a new PDF. I did OCR on this 1-page-file (with Acrobat 9) with ClearScan, and the above problem did not appear: No arbitrary spaces, text was ok and searchable.
Did nobody ever watch this behaviour, although it concerns both Acrobat 9 and 10 ?

I think I found the solution to your problem. Just save the problematic OCR Clearscan document as Text (Plain). By just doing so, Acrobat Pro corrects and eliminates all the spaces between characters. Then you only need to save again the PDF document, and you will see (even if hard to believe) that it is already fixed. At least it worked very well to me!

Similar Messages

  • IOS Cangjie input method can not be used. Original with Space Bar word selection function is invalid, as soon as possible to solve, it will affect the most users in Taiwan and Hong Kong.

    iOS Cangjie input method can not be used. Original with Space Bar word selection function is invalid, as soon as possible to solve, it will affect the most users in Taiwan and Hong Kong.

    No PR1.2 Firmware update, No update for Nokia Messaging, No updates for Ovi Maps .... Why is the N900 the forgotten stepchild ?
    "PR1.2 is still being tested - http://twitter.com/luovanto/statuses/13087245356"
    Hong Kong Launch http://www.youtube.com/watch?v=pY8zCBzvQLo

  • Find space between word with different endings and digit

    Hi,
    I'm trying to figure out how to find spaces between word and digit.
    I am limited to use only word 'WORD' (either capital or small caps) with different endings like -ing -s -y and more (using \S+).
    I wrote something like (?i)(?<=WORD\S+)\s(?=\d+) but this does not seem to work due to some limitation of lookahead I belive?
    Any suggestions?
    Peter

    Peter Stnsz wrote:
    … find what: (\<WORD\S+)(\s)(?=\d)
    change to: $1~s
    Hmmh?
    Your Grep do not find your first example: WORD 0,2 (WORD without any ending)
    And you don't need the second ()
    And please do not use \S
    Use this instead:
    (\<WORD\l*)\s(?=\d)
    (l is the little L)
    Have fun

  • Sapscript - Repited spaces between words

    Hi friends.
    I'm working on a ZMEDRUCK sapscript and when printing a line item, the item text (field EKPO-TXZ01) is being printed with spaces between words.
    E.g.:
    Original value in EKPO-TXZ01: material test.
    Printed: material      ...(two or more spaces)...       test.
    This also happened with other text variable, but stopped happen when I formated with <K> text </>.
    But in this case, its not working. I'm using standard IT Paragraph format.
    What to do to take out the repited spaces ?
    Edited by: Glauco Kubrusly on Mar 24, 2011 5:02 PM
    Edited by: Glauco Kubrusly on Mar 25, 2011 4:27 PM

    Hi Atul.
    I have already tested (C) statement in this variable and performs too.
    There are more texts with same problem. Literal words have same problem too in address in the header.
    I've already tryied copting standards paragraphs too, but without sucess.
    A friend concultant said to me to make a test with a new paragraph with LEFT alignment only and LEFT alignment to all tabulators. I'll try that once and give feedback here.
    Any other idea ?
    thanks.
    Glauco
    Edited by: Glauco Kubrusly on Mar 27, 2011 10:16 PM

  • Office 2010 spaces between words not consistent

    Hello all,
    I recently switched from Office 2003 to Office 2010. Since the switch I notice that the spaces between words are not consistent in length.
    I selected the spaces individually and even in the selection you see differences in the size of the spaces.
    This example is from a PPT using Ariel, 18 font size.
    Things I've tried so far:
    I've tried disabling kerning, but that makes the font look really bad. I also tried adjusting the character spacing and while this does solve the issue on an individual basis, I don't have the time to go over every space in a document or presentation to fix
    this issue. 
    The text is aligned left and I've verified that there is only one space between the words.
    I tried the same document on 2 different pc's. One of Windows 7, the other on Windows 8.1. Both have this issue with Office 2010. 2 Different computers using Office 2013 do not have this spacing issue.
    Does anyone know if this is a known issue with Office 2010? Is there a fix?
    Thanks in advance,
    Zjef

    Hi Zjef,
    Generally, in Word, spaces should be consistent for all texts, unless you have your paragraph set to "Justify" which means to stretch out the text to fill the entire space within the margin.
    So please first check if you have this option selected for your text.
    Also, have you ever tried to print it out and check if it's the same as on the screen? Sometime word will adjust the space to improve document readability, which vary by screen sizes, video cards, etc.
    Regards,
    Ethan Hua
    TechNet Community Support
    It's recommended to download and install
    Configuration Analyzer Tool (OffCAT), which is developed by Microsoft Support teams. Once the tool is installed, you can run it at any time to scan for hundreds of known issues in Office
    programs.

  • How can i write with double spaces between words ?

    hello everyone,
    i need to write in pages with double spaces between words not between lines ..
    how can i do it and make it default ?
    for now, just write and click space bar twice ..!!!

    Do you really need to do this? It is so ugly and illegible.
    You can use a monospace font or use the auto-complete in Menu > Pages > Preferences to auto-substitute 2 spaces for each one.
    Peter

  • Regular expression to replace "emtpy space" ( ) bitween words with +

    Hallo!
    When I wish to find in code something like this:
    12144541 FirstWord SecondWord
    regular expression for that is:
    (\d{1,100})[\s-]\D{1,100}[\s-]\D{1,100}
    Now, please help me tu find regular expression to replace
    "emtpy space" ( ) bitween words with +
    12144541 FirstWord SecondWord to become
    12144541+FirstWord+SecondWord
    Thank you very, very, very much!

    A simple-minded solution is to use \s to match all
    whitespace; e.g. find \s and replace with +. DW CS3, at least, is
    smart enough to not replace end of line characters with the '+'
    character if you limit your search & replace to text.

  • Why is illustrator CS6 omitting the spaces between words?

    Out of the blue, my type tool has been acting funny in Illustrator CS6. Generally my text boxes or bounding boxes are outlined in blue. However, now they are black. I'm not sure what instigated the change. Now I am typing in illustrator and it won't show any spaces between words. Everythinglookslikethis. In the character pannel, the kerning is set to auto and tracking is set to 0. Does anyone know what is going on? What are these black text boxes of death?

    chatterville wrote:
    Out of the blue, my type tool has been acting funny in Illustrator CS6. Generally my text boxes or bounding boxes are outlined in blue. However, now they are black. I'm not sure what instigated the change.
    Creating a new layer changes its selection color. Look in the Layer's panel and double click the layer to change its color.
    .. Now I am typing in illustrator and it won't show any spaces between words. Everythinglookslikethis. In the character pannel, the kerning is set to auto and tracking is set to 0. Does anyone know what is going on? What are these black text boxes of death?
    Not sure but, try changing the font.
    You can also try the word spacing options in Justification from the flyout menu of the Paragraph panel

  • When I copy/paste a paragraph from a PDF doc, it pastes with no spaces between words. How can I fix this? I've searched everywhere for the solution but didn't find anyone had this issue.

    When I copy/paste a paragraph from a PDF doc (onto Facebook status bar, for example) it pastes with no spaces between words.  I.e: "The manhadajugofcoolwaterand offeredmeadrink." How can I fix this? I've searched everywhere for the solution but didn't find anyone had this issue. I'm new to Apple so any help will be much appreciated. Thanks!

    I can't speak about this occurring with FaceBook since I don't use FaceBook, but I see the same type of thing when I copy and pasted from websites. It doesn't always happen and I cannot find any particular set of circumstances under which it occurs or does not occur.
    I am merely responding to let you know that it happens to me as well and I have seen no way to correct it. I'm not so sure that there is a way to correct in in that it may have something to do with how the original text is formatted in the PDF or on the website and how it eventually "fits" into the text field or the area in which it is pasted.

  • ASK - How to delete line space in Word document using oracle form

    Hi,
    I have 1 template word document.
    Inside the word document template :
    abcdefg
    [START_TERMS]
    [END_TERMS]
    bbbbbbbbbbbbbbbb
    currently, my program only can delete the statement [START_TERMS] and [END_TERMS].
    so the result
    abcdefg
    ----> space area
    ----> space area
    bbbbbbbbbbbbbbbb
    My expected result is
    abcdefg
    bbbbbbbbbbbbbbbb
    How to replace that two space area in oracle form ?
    Please advise.
    Thanks for your help.
    regards,
    Iwan

    Hi
    Here my code :
    Function DeleteWord(MyOwnSelection OLE2.OBJ_TYPE, MyWord VARCHAR2) Return Boolean Is
    MyArgs OLE2.LIST_TYPE;
    vReturn boolean;
    Begin
    vReturn := FindWord(MyOwnSelection , MyWord);
    -- Can't Find The word
    if vReturn = False Then
         Return False;
    End if;
    OLE2.INVOKE(MyOwnSelection, 'Delete');
    MyArgs := OLE2.CREATE_ARGLIST;
    OLE2.ADD_ARG(MyArgs, '0');
    OLE2.ADD_ARG(MyArgs, '0');
    OLE2.INVOKE(MyOwnSelection, 'SetRange', MyARgs);
    OLE2.DESTROY_ARGLIST(MyArgs);
    Return True;
    End;
    returnValue := WordFunction.DeleteWord(MySelection, '[START_TERMS]');
    Thanks and regards,
    Iwan
    Edited by: user1888509 on Jun 7, 2011 10:34 PM

  • Embedded spaces in words

    I am using Adobe Reader Version 9.4.5.  When I select, copy and paste certain content from a pdf file into another tool, say MSWord, Outlook, or even into Adobe Reader's seach box, the words sometimes include embedded spaces.  For example, the word "enterprise" is used many times in the document I'm reading.  However, when I select, copy and paste content that includes that word it sometimes appears as "enterprise", sometimes as "enter prise", and sometimes as "enterp rise", etc.  This means I cannot reliably search for the word "enteprise" and find all the instances because some have embedded spaces.  For example, I'll select, copy and paste one of the offending words "enterprise" right here:  
    enterpr ise
    See how it has the embedded space?
    How can I eliminate these embedded spaces so I successfully search the document and find all the instances of the desired word, even though some instances have the embedded space?
    Thanks for your help.

    Are you using a fancy, custom font downloaded from some free
    font website?... your problem sounds like there's simply no space
    character... the box is the default 'character glyph' used when the
    nothing exists at the position in the font file.

  • XML in Flash Taking out the "space" between words

    I have a xml document that flash reads to pull random words
    from it. I replace the letters of the words with a character, right
    now its a minus sign (-). What I am haveing problem with is I don't
    want the spaces between the words to be replaced. I want them to be
    ignored but still have the space between the words. Example: "big
    dog" would be (--- ---). Is there a way to have flash recognize
    this "space" and not replace it with anything?
    Thanks,
    Luke

    I think the XML aspect is irrelevant here... all you want to
    do is disguise regular text right? There's probably a more
    efficient way to do this, but here's a start anyhow.

  • Mail subject line and safari Goggle Search window removing spaces between words

    After upgrading to Lion I can no longer type out a subject line in Mac Mail witout the spaces automaicaly being removed. All words run together. Same situation in Safari's Google Search window. No issues in FireFox. Driving me nuts. Any others with this issue?

    Found the answer to this issue. I'm amazed this received no responce from the community. Must mean I was the only one with such issue. Amazinly, the exact same thing was happening on three differnt Macs that I loaded Lion to. The answer was given to me by a very helpful and SMART tech person at Apple Care. Two other Apple Care techs were going to have me reinstall the entire OS after wiping my computers clean. I finally found someone that knew what they were doing. He simply took me to Apple>Preferences>Language & Text. Then I selected the Text Tab and then went to the bottom of the window and Reset Default Settings. Bingo, all was fixed. Each machine I did the same thing on and all are working fine now. Hope by me typing this up will help others if needed since not one person responded to this inquirey.

  • Spaces between words in a word document deleted

    in a word document 2010, I found all the
    words with spaces. Suggestions?

    in a word document 2010
    You didn't mention the version number of your current Office client.
    Paul mentioned a typical scenario above for this kind of issue, if this is your case, please try to update your currect Office installation, then try again.
    If not, please post back with more details, then we'll take a further look.
    Regards,
    Ethan Hua
    Forum Support
    Come back and mark the replies as answers if they help and unmark them if they provide no help.
    If you have any feedback on our support, please click
    here

  • Missing spaces between words after importing Word document into RoboHelp HTML version 8

    I created a 400+ page document in Word 2007 and imported it into RoboHelp 8 (HTML, not RoboHelp for Word). In many places, the space between a bold word and a non-bold word is no longer there. I would like to avoid having to go through the entire RH document to replace the spaces manually. Did I import incorrectly?

    Looks perfect, works terribly...
    Things to consider:
    Chances are, the different treatments of the space are due to whether the space was included in the bolding of the word in MS Word.
    Additionally, since Word files contain much underlying binary code (macros and such), your security settings might or might not be playing well with the Word stuff.
    Are you importing the Word file from a network location, or from your local machine? Network traffic might be partially impacting the import, too.
    Bottom line? You'll probably need to clear all formatting in the Word file and redo the formatting properly. To test this, cut a couple of pages into a new Word file, clear and redo the formatting in Word, import the Word file into another RH project, and see what results you get.
    To clear the formatting:
    Select text.
    On the Home tab, in the Font group, click Clear Formatting.
    Redo the formatting, being careful to not format the spaces.
    Good luck,
    Leon

Maybe you are looking for

  • SQL Server 2005 agent job runs a SSIS package ( Analysis Services Processing Task) fails

     Hi, SQL Server 2005 standard edition.  I have a SSIS package which has a  Analysis Services Processing Task. I have tested the package in BIDS and it runs ok. But when I created a agent job and run it from the job it reports error: Code: 0xC0012024

  • Anyone using a Laserjet All-In-One on an Intel Mac?

    I just got an HP Laserjet 2820 all in one unit (scanner, printer, copier) for use with my MacBook Pro. My main reason for going with this model was the automatic document feeder (ADF) that allows me to put a stack of documents in and automatically cr

  • Display not sleeping

    After the period of time I've specified to have my display go to sleep it will turn off but then immediately turn back on. The result is that my display doesn't sleep at all. When I'm not using it for extended periods of time (or trying to sleep) I h

  • Arch Linux Handbook

    I am thinking of creating an Arch Linux Handbook much like Gentoo's and Freebsd's excellent ones that go into great detail.  I know there is already a great installation guide, so a couple of questions. Am I just reinventing the wheel, or do you thin

  • How to ensure that the query results are fetched from the BW acclerator.

    Hi Experts, Suppose if i want to execute the query with BIA option, I can achieve it in RSRT . 1)But is there some query settings or properties where we can make sure that whenever a query(say Q1) is executed, always the data should be fetched from t