Character recognition

Hello!
I would like to make a class which will have to recognize the text from scanned technical drawings.
Using Google ( to find resources, technical information) I have found only OCR software for sale, while I am more interested in pages which will teach the way the recognition process is taking place ( if it has some example in Java will be better).
Can you give me some indication about how it can be done " character recognition"?
Thank you!

This should give you a nice evening of reading :
http://cgm.cs.mcgill.ca/~godfried/teaching/pr-web.html

Similar Messages

Debian Linux character recognition problem

Maybe this shouldn't be on this forum, but perhaps someone can help me...
I have Jrun 3.1, java 2 1.4 and Debian Linux.
On our test server everything was hunky dory, running perfectly.
But once everything was on the real server, we've had a character recognition problem. All of the higher characters are just replaced by a y or a ?.
I'm pretty sure that we can exclude java from the problem, the i18n.jar, which I believe holds all the international stuff is there and in the classpath (exactly as on the test server) and jrun is exactly as on the test server because we copied every folder as it was.
Therefore I think we can bring it down to Linux (Debian).
I cannot find where to look in Debian for any kind of keyboard or locale problem. Anyone know what to do?
kbdconfig command doesn't exist on our system.
Any help greatly appreciated.
We are in Italy, by the way.
Thanks.

If you check out the default file encoding you'll find that it's 7-bit ASCII; no characters > '\u007F' come out as they should. This applies as well to compiling as to running.
You can compile with another character encoding using the command line options, like: "javac -encoding ISO-8859-1 package/YourClass.java" (available at least in 1.4) and "java -Dfile.encoding=iso-8859-1 package.YourClass".

OCR (Optical Character Recognition)

I have just bought this new printer/scanner.I am trying to use the OCR (Optical Character Recognition) text converter that came with my old printer. When I scan as a photo the output comes as several pictures and all the text is missing. When I scan as a document the output is in PDF. And the OmniPage text converter is of no use.Does anyone have any suggestion. Or should I just buy a new ink cartridge for my old Canon, and return this HP to Costco?Many thanks.

Your answer is much appreciated. I'll get round to trying the solution when I can whip up enough enthusiam. Right now I'm a bit fed up with trying to get the machine sorted out! Probably not so much the machine as the operator, it's way too hot here today. 33 deg celcius.I am a bit unsure if it will do the trick, but it's the best aswer so far!Thank you

Character recognition results in question marks using CuteFTP

I'm have trouble with character recognition when I upload my site to Network Solutions using CuteFTP. Some returns, apostrophies, etc. appear as question marks. I have tried changing the prefernces to Macbinary and to binary without any change. Any suggestions?

For my own education...
Would my solution have worked?
I realize that it's not a true fix. It's a
workaround. However, I'd like to know if it would
have made a difference.
Yes, if you confine your text to pure ascii, without any "smart" punctuation, that may work. However, the templates often have other non-punctuation characters in them already that cause the same kind of problem and removing all of them could be really tedious.

Character recognition using java

Hi guys,
I have a fairly complex problem that I need to solve. Basically I am reading a set of pixel colours from a 3rd party client. I need to take this pixel data and recognize the characters used in it.
Does anybody know of any good character recognition tutorials I can use?
Anybody every done anything like this?
Any help will be great
Thanks
Alex

over the last month or so i have checked out those links and the downloadable software they offer, however, you can only us their trail versions which arent suitable for me.
Does anybody know where I should start if i want to write my own OCR functionality? are neural nets the best way to go?
Any advice/suggestions will be great.
Thanks
alex

How to train OCR using VISION ASSISTANT for multiple character recognition

Sir I have tried training OCR using Vision Assistant for character recognition. For the process i have used a fixed focus camera but the character i had trained were undetectable. So sir please provide me a liable solution to the problem.
Thank you.
I have attached my project description and also the .vi file of my work towards it.
Attachments:
Project phase I.vi ‏138 KB
WP_20140814_17_27_38_Pro.jpg ‏1444 KB

Can you post a real jpg instead of renaming a bmp to jpg?

Optical character recognition--how do I scan a document to text I can edit.

Optical character recognition--how do I scan a document to text I can edit?

You first scan the document to a PDF. Then you run OCR from within Acrobat. Depending on your setup, the OCR may be run automatically.
If you want to edit so the changes show on the screen, you will need to use ClearScan. The other two searchable options put the OCR text in a layer behind the image of your original document.

What OCR (optical Character recognition) software is compatible with iMac?

What OCR (optical Character recognition) software is compatible with iMac?
I am using OS X 10.9.1 on my iMac with an Epson CX8600 printer/scanner.

My new Brother multifunction came with Presto PageManager. I can't recommend that it in the least. Its OCR is terribly inaccurate to the point of being unusable in my tests. Even normal sentences came out mashed together.
I wanted to try IRIS, but it has no test version, just videos, that I could find, and emails to them just led to a Windows demo that is irrelevant. They ignored my email asking about a Mac demo, so I could not even try it out. That pretty much knocks it out of the running for me as this software is too expensive to just buy based on their promotional videos alone.
Then there's Vuescan, and it's not too bad for OCR and supports about every device you could possibly conceive of using, and that's a plus. It was fairly accurate, but not as accurate as Finereader Pro. The interface is a bit complex to my taste.
I've been testing out ABBYY Finereader Pro, and they do have a demo, and so far it seems to be the most accurate of what I've been able to test. It gets pretty close to everything on English and Spanish, and supports most common languages (check the list on their website). It even properly recognized syllables of phonetic Tibetan (using Roman alphabet, but it will not detect Tibetan). The UI is pretty good, but not perfect and there's no way to train for odd words you might need. I haven't tried it for Chinese or Japanese, but for my purposes, unless I find something more economical, I'll buy this one for its accuracy.
I really suggest you download demos and try them for yourself as none of these are particularly cheap

Any Java API available for character recognition.. Please help

Hi,
I am wondering if there are any Java API for identifying the character encoding of a text content. I came across NGramJ but not enough documentation to integrate with my application. Any help would be appreciated.
Thanks.

I used this but the CharsetToolKit identifies only among UTF-8, UTF-16LE and UTF-16 not any other encodings like TIS-620 etc. I am new to this as well, so not sure whether I am doing it right. Please advise.
Also, if any samples of chardet would be appreciated.
One thing not sure, is when I send a message has Thai characters from Hotmail having my browser setting to Thai encoding(TIS-620) but my Hotmail account language is English and sent to one of my exchange accounts. In the outlook, the message looks gibbrish.
So I need the charset encoding detector to let me know what type of encoding is done on the content (as if you choose English as the language option, the Hotmail server doesn't have charset parameter in the content-type header) so that I can decode and re-encode to UTF-8.
Any immediate response would be appreciated.

Cannot perform OCR (character recognition) on a pdf

While attempting to perform OCR I get the following message:
"Unable to process the page because the Paper Capture recognition service unexpectedly terminated".
I've tried JPEG files on RGB, Grayscale, 600 dpi, 400 dpi, 300 dpi, but it keeps on not performing OCR. How do I make OCR work? Thanks.

Use TIFF files to get OCR to work.

Pdf ifilter OCR (Optical Character Recognition) 64 bit SharePoint

Folks,
We have 3 windows server 2008 (2 front end and 1 index) 64 bit MOSS servers and 1 SQL Server 2008 64 bit server in our SharePoint environment.
We also have pdf iFilter 64 bit version from Adobe and we can successfully search PDF content. http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025
However, scanned / OCR ed PDF content are not returned while indexing using the pdf iFilter 9 (64 bit). Then I followed http://blogs.msdn.com/ifilter/archive/2007/03/29/indexing-pdf-documents-with-adobe-reader-v-8-and-moss-2007.aspx except with Adobe Reader 9, which indexes PDF files that are OCR ed.
Also, this appears to be 32 bit version of reader and I couldn't find 64 bit version of Adobe reader, however it works great. I can successfully search scanned pdf content from MOSS search interface. J
My question is
1. What is the different between 2 iFilters
2. Am I doing some thing that might significantly affect index server performance?
3. Why OCR not working on 64 bit iFilter version from Adobe?
Any thoughts would be great
Thanks
Parthi

Parthi,
Since the crawler just uses the registered iFilter for returning text, you should ask Adobe why there is a difference.
MMatthew McDermott, MVP MOSS

Every time I try to perform a character recognition on a PDF on my Mac, to make the PDF searchable, it crashes. Latest OS X, latest Acrobat, brand new Mac.

I am having trouble making PDFs searchable. I often receive bundles of scanned PDFs. I can combine them. I then try and make the resulting file searchable and the process always crashes after a few hundred pages. I have reduced the DPI output from 600 to 300 but that does not help. Running Mavericks on the latest trial Acrobat on a 2014 Macbook Pro.

Hi crimlaw,
Please check the performance by updating Acrobat to the latest patch i.e. v11.0.07 and check.
Regards,
Rave

Table of contents from lists or character styles

I have a legal document and I want to create a table of contents that includes, not just Headings (Article I, II, III, etc.) but also the sub headings (eg: Artilcle IV.3) which were created as numbered lists. If I give the list item a paragraph style it will put the whole text into the TOC. If I put a carrige return after a List heading it makes the body text as a new number in the list. Is there some wat to get either lists or character styles into the TOC? Can you have an in-line paragraph style? Here's what the document looks like:
Article VII Heading
List Heading. body text...
a)   Sublist heading body text
b)   Sublist heading body text
List Heading. body text...
Is there any way to do this so that the TOC reads something like this:
Article I   Heading              ....... page 2
          2. List Heading          ....... page 2
            a) Sublist Heading   ....... page 2
            b) Sublist Heading   ....... page 2
          3. List Heading          ....... page 3
Article II                              ....... page 3
etc. ??
I could do this manually, but this is document is being edited now and again and I don't want to have to change the TOC everytime we make a small chage to the document.
Thanks,
Brendan

The numbering and lettering are automatically generated by the List I selected.
It's been a few months since I did this, so I'm trying to remember exactly how the process went.
I've edited Bylaws and Constitutions like this several times for several non-profit organizations over the years, first on MS-DOS with floppy-discs using WordPerfect, later Windows and MS-Word, and now Mac OS-X and Pages '09. With each new iteration of software it keeps getting less painful, but it's still not a piece of cake.
This time, I began with scanned images of the last printed original copy (2008) for a document that no longer existed in any original computer format (1994). I imported the scans into Optical Character Recognition software included with my Canon printer/scanner.
I was determined NOT to re-type the whole document from scratch, so the editing I describe WAS time-consuming and a bit tedious, but still a bit less painful that starting from scratch. I'm a volunteer and retired. A paid fast(er) touch-typist working in an office (and their supervisor) might strongly disagree!
After cleaning up a few OCR-generated typos, I was also determined not to manually re-create the outline format and the table of contents if at all possible. Even 1980's WordPerfect on floppy discs could automatically generate an outline and a table of contents from marked text!
I used Lists to generate the desired outline format similar to the original, in some cases, correcting errors, but as shown in the above example, there are a few A's without B's and so on, because the original document (1994) was formatted that way, and I didn't want to substantially re-write the Bylaws at this time. (Save that for another day!)
As I edited, with the printed original by my computer, I did delete the original outline I's, A's, 1's a's, and so on as I went through, letting List do the re-numbering, and using Style to format the newly numbered headings.
Simple shortcuts when auto-generating lists: a [Tab] moves the active heading to the next-lower designation, and [Shift]+[Tab] moves it to the next-higher designation. Occasionally, I have to just use [Delete] to back up over the suggested letter, and start over again with [Return] to force the next letter/number.
And opening Inspector, Text, Lists, as shown in the example above, might help you more easily 'control' the outcome, as does Inspector, Document, TOC, noted earlier.
Hope this helps!

I do not want the online version to do text recognition how do it stop it?

everytime I upload a document in the online version and use text recognition it messes up some of the typing. Is there a way to avoid text recognition?

Hi akameni,
There used to be an option to disable OCR via the web interface, but there is no longer. You can, however, disable it from within Adobe Reader, by following the instructions here: How to disable Optical Character Recognition (OCR) when converting PDF to Word or Excel.
Best,
Sara

How can I turn off text recognition when converting PDF to Word?

I just converted a PDF to Word. It was an image file, but the converter recognized certain lines in the illustration as text elements, so some of the pictures are "chopped up". Is there a way to "turn off" the text recognition feature so that I can keep my images intact? Thanks for your help!

Hi watermelon321,
You can't disable OCR if you convert via the ExportPDF website, but you can if you convert from within Reader. This document tells you how: How to disable Optical Character Recognition (O... | Adobe Community
Please let us know if you have additional questions.
Best,
Sara

Character recognition

Similar Messages

Maybe you are looking for