PDF OCR Japanese Characters

Hi,
I'm new to Adobe forum and Adobe products.
I would like to check if there is any way for us to OCR a Japanese-English mixed PDF into Word, ie. recognising the Japanese and English text in the PDF into a text file, instead as a picture-like PDF file?
Appreciate any kind advice please.
Thanks.

Hi,
I'm new to Adobe forum and Adobe products.
I would like to check if there is any way for us to OCR a Japanese-English mixed PDF into Word, ie. recognising the Japanese and English text in the PDF into a text file, instead as a picture-like PDF file?
Appreciate any kind advice please.
Thanks.

Similar Messages

  • [Bug Report] CR4E V2: Exported PDF displays Japanese characters incorrectly

    We now plan to transport a legacy application from VB to Java with Crystal Reports for Eclipse. It is required to export report as PDF file, but result PDFs display Japanese characters incorrectly for field with some mostly used Japanese fonts (MS Gothic & Mincho).
    Here is our sample Crystal Reports project:   [download related resources here|http://sites.google.com/site/cr4eexportpdf/example-of-cr4e-export-pdf]
    1. PDFExportSample.rpt located under ..\src contains fields with different Japanese fonts.
    2. Run SampleViewerFrameClient#main(..) to open a Java Report Viewer:
        a) At zoom rate 100%, everything is ok.
        b) Change zoom rate to 200% or 50%, some fields in Japanese font collapse.
        c) Export to PDF file,
             * Fonts "MS Gothic & Mincho": both ASCII & Japanese characters failed.
             * Fonts "Meiryo & HGKyokashotai": everything works well.
             * Open PDF properties, you will see all fonts are embedded with built-in encoding.
             * Interest to note that copy collapsed Japanese characters from Acrobat Reader, then
               paste them into a Notepad window, Notepad will show the correct Japanese characters anyway.
               It seems PDF export in CR4E mistaking to choose right typeface for Japanese characters
               from some TTF file.
    3. Open PDFExportSample.rpt in Crystal Report 2008 Designer (trial version), and export it as PDF.
        The result PDF displays both ASCII & Japanese characters without any problem.
    Test environment as below:
    * Windows XP Professional SP3 (Japanese) with MS Office which including extra fonts (i.e. HGKyokashotai)
    * Font version: MS Gothic, Mincho, Meiryo, all in Version 5.0
        You can download MS Meiryo from Microsoft's Site:
        http://www.microsoft.com/downloads/details.aspx?familyid=F7D758D2-46FF-4C55-92F2-69AE834AC928&displaylang=en)
    * Eclipse 3.5.2
    * Crystal Reports for Eclipse, V2, 12.2.207.r916
    Can this problem be fixed? If yes how long will it take to release a patch?
    We really looking forward to a solution before abandoning CR4E.
    Thanks for any reply.

    I have created a [simple PDF file|http://sites.google.com/site/cr4eexportpdf/inside-the-pdf/simple.pdf?attredirects=0&d=1] exported from CR4E. It is expected to display "漢字" (or in unicode as "\u6F22\u5B57"), but instead being rendered in different ones of "殱塸" (in unicode as "\u6BB1\u5878").
    Look inside into this simple PDF file (you can just open it with your favorite text editor), here is its page content:
    8 0 obj
    <</Filter [ /FlateDecode ] /Length 120>>
    stream ... endstream
    endobj
    Decode this stream, we get:
    /DeviceRGB cs
    /DeviceRGB CS
    q
    1 0 0 1 0 841.7 cm
    13 -13 569.2 -815.7  re W n
    BT
    1 0 0 1 25.75 -105.6 Tm     <-- text position
    0 Tr
    /ttf0 10 Tf                 <-- apply font
    0 0 0 sc
    ( !)Tj                      <-- show glyphs [20, 21], which index is to embedded TrueType font subset
    ET
    Q
    The only embeded font subset is defined as:
    9 0 obj /ttf0 endobj
    10 0 obj /AAAAAA+MSGothic endobj
    11 0 obj
    << /BaseFont /AAAAAA+MSGothic
    /FirstChar 32
    /FontDescriptor 13 0 R
    /LastChar 33
    /Subtype /TrueType
    /ToUnicode 18 0 R                            <-- point to a CMap object
    /Type /Font
    /Widths 17 0 R >>
    endobj
    12 0 obj [ 0 -140 1000 859 ] endobj
    13 0 obj
    << /Ascent 860
    /CapHeight 1001
    /Descent -141
    /Flags 4
    /FontBBox 12 0 R
    /FontFile2 14 0 R                            <-- point to an embedded TrueType font subset
    /FontName /AAAAAA+MSGothic
    /ItalicAngle 0
    /MissingWidth 1000
    /StemV 0
    /Type /FontDescriptor >>
    endobj
    The CMap object after decoded is:
    18 0 obj
    /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
    /Registry (AAAAAB+MSGothic) /Ordering (UCS) /Supplement 0 >> def
    /CMapName /AAAAAB+MSGothic def
    1 begincodespacerange <20> <21> endcodespacerange
    2 beginbfrange
    <20> <20> <6f22>                         <-- "u6F22"
    <21> <21> <5b57>                         <-- "u5B57"
    endbfrange
    endcmap CMapName currentdict /CMap defineresource pop end end
    endobj
    I can write out the embedded TrueType font subset (= "14 0 obj") to a file named "[embedded.ttc|http://sites.google.com/site/cr4eexportpdf/inside-the-pdf/embedded.ttf?attredirects=0&d=1]", which is really a tiny TrueType font file containing only the wrong typefaces for "漢" & "字". It seems everything OK except CR4E failed to choose right typefaces from the TrueType file (msgothic.ttc).
    Is it any help? I am looking forward to any solution.

  • Re:How to display Japanese Characters

    Dear all
    Does any one has idea how to display Japanese characters in PDF output generated by the Reports6i.
    Reports in Discoverer are working fine but when it comes to pdf the Japanese Characters are lost.

    Hi Rohit
    Thanks for prompt reply.
    1) Suppose if we upgrade the Report6i By applying the Latest patch.
    2)Then Develop the reports in 9i editor and run it on live enviornment which is Oracle9iAs (Application server)
    , Reports6i server(patch 14) will it still loss the effects/Can cause problems Later on.
    Thanks
    Jai

  • Creating a PDF from a SAAS app creates boxes instead of Japanese characters

    I'm using an online app (Unleashed Software) to "print" invoices, and the printed invoices show boxes instead of Japanese characters. The really weird thing about this problem, is that it occurs only on certain devices. I've tested on Macs, Windows, Android, and iOS, and on some devices I get the problem, and on some devices I don't. It's not just a Windows problem, or a iOS problem. Additionally, I use different browsers, from Chrome, to IE, to Firefox, to Safari. Changing the browser doesn't seem to help when it's on a device that won't output Japanese characters in a PDF properly.
    I'm wondering how PDFs are generated when using online software. Since I can't reproduce the problem on certain devices, it seems to me that the software is using some local settings to render the PDF incorrectly.
    Any ideas of how I could go about troubleshooting this problem?

    Hi,
    Could you please answer the following questions
    1.What version of Crystal Reports are you using?
    Go to Help-> About to find out.
    2.What is the font you are using on the report?
    Try to change the font style to MS Gothic or Arial Unicode MS, most preferably MS Gothic.
    And export the report to PDF format.
    This may help you
    Thanks,
    Praveen G

  • Chinese/Japanese characters not appearing on smartforms PDF output

    Hi,
    The print preview of the Smartforms output layout is correctly displaying the characters of native/local languages like Chinese, Japanses etc.....but when i try to print this output, it is printing the junk characters.
    Whereas the same printer is able to print the Chinese, Japanese characters when printed from MS Word.
    So this issue is occurring only when printing from SAP.
    In spool i could see the Chinese/Japanese characters appearing correctly, whereas when i try to convert it to PDF using program RSTXPDFT4, the PDF is again showing junk characters replacing the chinese characters.
    Thanks!

    This could be of different reeasons...
    1) Make sure your printer is uni-code enabled.
    2) Make sure, your unicode enabled printer is configured in SAP.
    3) make sure, your printer device is supported by SAP. (You can find list of SAP recommended printers in www.service.sap.com)
    4) Check whether the correct device type is used for printing chinese and japanese characters.
    5) Check code pages.
    6) Make sure you use Cyrillic font family, for printing chinese and Japanese characters.
    Regards,
    SaiRam

  • Square boxes instead of Japanese characters in pdf output of a crystal rep

    Hi All,
    Did anyone of you ever faced such an issue, where the pdf output of a crystal report shows square boxes instead of Japanese characters when the output is saved in pdf. However the crystal report output looks perfect. I have saved the ouput in xls and rtf formats the characters look perfect as required, the issue we have is when the output is saved as pdf.
    I have language pack installed on my machine.
    My guess is that the few Character's width is not sufficient and few Characters in other fields appear perfect in pdf. I have to test this, it might take few days before I can access this report. Before that I want to gather information. If anyone has solution to this issue please let me know.
    Thanks,
    Ravi

    Hi,
    Could you please answer the following questions
    1.What version of Crystal Reports are you using?
    Go to Help-> About to find out.
    2.What is the font you are using on the report?
    Try to change the font style to MS Gothic or Arial Unicode MS, most preferably MS Gothic.
    And export the report to PDF format.
    This may help you
    Thanks,
    Praveen G

  • Find and Replace Japanese characters in pdf file on iPhone

    Hi eveybody !
    I want to find and replace Japanese characters in pdf on iPhone.
    I using zlib to deflate stream - endstream block and extract text.It's work fine with latin-text.
    But when i work with japanese characters , I don't know how to do it ?
    I decode a sample japanese pdf file, and I know that each Japanese characters are performances as hex string : "<01b7><0e230a23>..."
    But i don't know how to convert Japanese characters to the hex string like that.
    Can evrybody help me?
    Thanks!

    Searching is the same process as extracting - since it's about turning page content into something understandable.  So that still remains what you need to learn/understand - of course, referring back to all the previous sections about font formats, etc.
    Replacing in PDF is EXTREMELY DIFFICULT for two reasons - subset fonts and explicit glyph positioning.  Have you determine (conceptually, if nothing else) how you plan to addresses these two issues?
    PDF doesn't do UTF8 for page content - so don't worry about that.

  • How to create Japanese characters PDF files -- Oracle9i

    After modified the uifont.ali file, I can get Japanese characters PDF file by running command line(rwrun.exe) on Oracle 9i AS.
    If I call the report file from Oracle9i forms(by using run_report_object ), though the PDF file was created, the Japanese Characters can not be displayed correctly.
    Can anyone help me?
    Thanks.

    Hi,
    Please go through following links..this will help you:
    http://lucamezzalira.com/2009/02/28/create-pdf-in-runtime-with-actionscript-3-alivepdf-zin c-or-air-flex-or-flash/
    http://forums.adobe.com/thread/753959
    http://blog.unthinkmedia.com/2008/09/05/exporting-pdfs-in-flex-using-alivepdf/
    Thanks and Regards,
    Vibhuti Gosavi | [email protected] | www.infocepts.com

  • Oracle Report Server Issue with Japanese Characters

    We are trying to setup a Oracle Report Server to print the Japanese characters in the PDF format.
    We have separate Oracle Report servers for printing English, Chinese and Vietnamese characters in PDF formats using Oracle Reports in the production which are running properly with Unix AIX version 5.3. Now we have a requirement to print the Japanese characters. Hence we tried to setup the new server for the same and the configurations are done as same as Chinese/Vietnamese report servers. But we are not able to print the Japanese characters.
    I am providing the details which we followed to configure this new server.
    1.     We have modified the reports.sh to map the proper NLS_LANG (JAPANESE_AMERICA.UTF8) and other Admin folder settings.
    2.     We have configured the new report server via OPMN admin.
    3.     We have copied the arialuni.ttf to Printers folder and we have converted this same .ttf file in AFM format. This AFM file has been copied to $ORACLE_HOME/guicommon/gk/JP_Admin/AFM folder.
    4.     We have modified the uifont.ali (JP_admin folder) file for font subsetting.
    5.     We have put an entry in JP_admin/PPD/datap462.ppd as *Font ArialUnicodeMS: Standard "(Version 1.01)" Standard ROM
    6.     We have modified the Tk2Motif.rgb (JP_admin folder) file for character set mapping (Tk2Motif*fontMapCs: iso8859-1=UTF8) as we have enabled this one for other report servers as well.
    Environment Details:-
    Unix AIX version : 5300-07-05-0831
    Oracle Version : 10.1.0.4.2
    NLS_LANG : JAPANESE_AMERICA.UTF8
    Font Mapping : Font Sub Setting in uifont.ali
    Font Used for Printing : arialuni.ttf (Font Name : Arial Unicode MS)
    The error thrown in the rwEng trace (rwEng-0.trc) file is as below
    [2011/9/7 8:11:4:488] Error 50103 (C Engine): 20:11:04 ERR REP-3000: Internal error starting Oracle Toolkit.
    The error thrown when trying to execute the reports is…
    REP-0177: Error while running in remote server
    Engine rwEng-0 crashed, job Id: 67
    Our investigations and findings…
    1.     We disabled the entry Tk2Motif*fontMapCs: iso8859-1=UTF8 in Tk2Motif.rgb then started the server. We found that no error is thrown in the rwEng trace file and we are able to print the report also in PDF format… (Please see the attached japarial.pdf for your verification) but we are able to see only junk characters. We verified the document settings in the PDF file for ensuring the font sub set. We are able to see the font sub setting is used.
    2.     If we enable the above entry then the rwEng trace throwing the above error (oracle toolkit error) and reports engine is crashed.
    It will be a great help from you if you can assist us to resolve this issue…

    Maybe 7zip or another tool has workarounds for broken file names, you could try that.
    Or you could try to go over the files in the zip archive one-by-one and write it to files out-1, out-2, ..., out-$n without concerning yourself with the file names. You could get file endings back via the mimetype.
    This script might work:
    #include <stdio.h>
    #include <zip.h>
    static const char *template = "./out-%04d.bin";
    int main(int argc, char**argv)
    int err = 0;
    zip_t *arc = zip_open((const char*)argv[1], ZIP_RDONLY, &err);
    if(arc == NULL)
    printf("Failed to open ZIP, error %d\n", err);
    return -1;
    zip_int64_t n = zip_get_num_entries(arc, 0);
    printf("%s: # of packed files: %d\n", argv[1], n);
    for(int i = 0; i < n; i++)
    zip_stat_t stat;
    zip_stat_index(arc, i, ZIP_FL_UNCHANGED, &stat);
    char buf[stat.size];
    char oname[sizeof(template)];
    zip_file_t *f = zip_fopen_index(arc, (zip_int64_t)i, ZIP_FL_UNCHANGED);
    zip_fread(f, (void*)&buf[0], stat.size);
    snprintf(&oname[0], sizeof(template), template, i);
    FILE *of = fopen(oname, "wb");
    fwrite(&buf[0], stat.size, 1, of);
    printf("%s: %s => %lu bytes\n", argv[1], oname, stat.size);
    zip_fclose(f);
    fclose(of);
    zip_close(arc);
    return 0;
    Compile with
    gcc -std=gnu99 -O3 -o unzip unzip.c -lzip
    and run as
    ./unzip $funnyzipfile
    You should get template-named, numbered output files in the current directory.
    Last edited by 2ion (2015-05-21 23:09:29)

  • Problem while exporting to PDF with japanese character in Crystal Report X1

    Hi,
    I am using Crystal report X1 with classic ASP on a Windows 2003 Enterprise Server, SP 2. In my application, I have to export the report into PDF, CSV, DOC formats. I am have Japanese strings in the report. While exporting to PDF, empty boxes has displayed in the place of Japanese string and in CSV file, question mark has been displayed instead of Japanese string. But the Doc file is exported correctly. I have not installed any language support software either in server or in client machine. I have used MS Gothic and Arial Unicode MS fonts for the text-objects which has Japanese strings.
    Please give me a solution so that I will get PDF file with Japanese strings instead of empty box or question mark.
    Do I need to install any language support pack software?
    Thanks in advance.
    Regards,
    Manju

    Hi Don,
    Thank You for your reply. I have resolved the issue of exporting to pdf from crystal report X1 having Japanese data after installing the language pack.
    But when I am exporting to CSV or TXT, i am getting ??? instead of Japanese characters. I have tried "export" through crystal report designer and got ??? instead of Japanese.
    The Crystal report version I am using is Crystal report X1  11.0.0.1282
    Crystal Report Desinger is installed in Windows Xp machine
    Font set for Text object is Arial Unicode MS, MS PMincho, MS PGothic
    Does the Crystal report X1 11.0.0.1282 has the UTF-8 support for CVS / txt
    Please provide me a solution
    Thanks in advance,
    Regards,
    Manju

  • Acrobat Pro 9.3.1 does not convert certain Japanese characters

    I have a text document that contains a mix of Roman and Japanese characters - when I do Create PDF From File and read that text document in, there is a sequence of 2 Japanese characters that disappear - the text before them and after them appear in the PDF, but there's a void between.
    The sequence is (don't know if I can insert Japanese in here...)
    before監査証跡after
    When the PDF is generated, the first 2 Japanese characters (after the last 'e' in before) do not appear in the PDF.
    Here is the source text document (UTF-8 encoded with BOM): http://www.scribd.com/doc/28158046
    and here is the resulting PDF: http://www.scribd.com/doc/28158121
    Anyone seen this before?

    If I paste your "before監査証跡after" into Notepad and save it as UTF-8 text, I can print the file to the Acrobat 9.3.1 Pro "Adobe PDF" printer with no problems at all: the 4 kanji appear in a font Acrobat calls "MS-UIGothic".  If I right-click on the saved *.txt file in Windows Exploreer (Vista 64) and select "Convert to Adobe PDF" I still get all the kanji, although the first shows up in Adobe Ming, the 2nd in Adobe Song, and the last 2 in KozGoPr6N.
    I can't explain what's going on here, but perhaps this can help point you down a useful path.
    David

  • Cannot write in Japanese characters on Firefox 7.0 using Mac OSX 10.5.8

    Ever since I upgraded to Firefox 7.0, I cannot input Japanese characters into Firefox. I use a Mac OSX 10.5.8. I have my international settings such that I can write in this and other languages, but none come up other than English.
    I can write in Japanese on Microsoft Word, Safari, and a slew of other applications, so it's definitely a Firefox issue.

    The following worked for me:
    http://support.mozilla.com/en-US/questions/684867
    "Problem: I had the same problem but with mac osx 10.5.8, Firefox 3.6.6 sometimes prints and sometimes does not, but the preview was always blank and I could not save to pdf file.
    Solution: I downloaded Firefox 3.5.10, installed it on my desktop and opened it, tried print preview and it worked, quit Firefox 3.5.10 and ran 3.6.6 and voila! print preview is working fine and I can save to pdf fine.
    I guess running 3.5.10 fixes the profile in a certain way. I have no idea how that happened but it worked."

  • Japanese characters in okular

    Hi,
    I want to be able to read japanese characters with okular. Right now okular doesn't show anything if i open a pdf with japanese text.
    I tried to install some poppler packages to fix this but it doesn't work.. Do i have to do something more to get it to work or isn't okular capable to show japanese characters?
    I dont even know if im supposed to install the poppler-packages but i read somewhere it should solve the problem
    I have installed these packages
    [adam@adam-pc ~]$ pacman -Qs poppler
    local/poppler 0.10.5-1
    PDF rendering library based on xpdf 3.0
    local/poppler-data 0.2.1-1
    Encoding data for the poppler PDF rendering library
    local/poppler-glib 0.10.5-1
    Poppler glib bindings
    local/poppler-qt 0.10.5-1
    Poppler Qt bindings

    kanalj, any luck with this?  I just ran into this problem (it worked previously), and reinstalling the 4 poppler packages solved the problem.

  • Converting garbled characters for JAPANESE characters in a custom table

    Hi all,
    I have a custom table that store Japanese characters.
    After my company has upgraded to ECC6.0, this data in the custom table has become garbled and its alot of it garbled.
    Is there any SAP tool that can I use to make the correction on those garbled Japanese characters?
    Thanks,
    William Wilstroth

    Hi Nils,
    I really really really had a field day reading and testing around UC... To my dissappointment, I do not have the authorization to use SUMG and SCP too as well as a few of the TCODES...
    I finally told my higher level technical mgnt. that this table might need some changes...
    Has this problem of mine got anything to do with MDMP since its no longer supported in ECC6 and I found one coding that search for MDMP in RSVTPROT...
    My colleagues suggest that the data be corrected from table DBTABLOG... which i find, in my opinion, is not the right way...
    Thanks,
    William

  • How can I get Japanese characters to show up for my music in iTunes?

    I am not exactly sure what generation my iPod is, but it says copyright 2004 on the back. It is 20 GB. It has no problems displaying Japanese characters if a particular song of mine is Japanese. I have my iPod set up to manually manage music. I like to carry my iPod between home and work. At both home and work, I use PCs with Windows XP Pro installed. I also use the latest version of iTunes (ver. 7.1.1.5).
    However, I have run into a weird problem. On my home computer, my iTunes displays Japanese characters perfectly fine, but on my work computer, whenever there is a Japanese character, iTunes does not recognize it and puts an ugly "square" character in its place. How do I get my iTunes to display these Japanese characters properly?
    Thanks.

    Well I just answered my own question. I needed to install the files for East Asian languages via the WinXP Control Panel. So if anyone else runs into this problem, there ya go!

Maybe you are looking for

  • File adapter scheduling

    I want to file adapter to pickup the file from tuesday to saturday. Donot pick up on Sunday & Monday. Do I need to specify the interval as 86400 in configuration and in RWB schedule it from tuesday to saturday.

  • Can't get iTunes to get lost

    Hi all, This has to be a question for the 'weird' section. I'm usiing iTunes 6.0.4.2 on my Pentium 4-based Windows XP machine, and I cannot get it to stay closed. Every time I click on close, it re-opens after a 10 second pause. I don't have my iPod

  • Downpayment clearing

    Hi guys I am using doc types KR and KN for downpayment. if i use doc type KR, i can clear the downpayment with no problem - however if i use doc type KN, i cannot. Can anyone suggest what i need to do to fix this? I am using F-47 for downpayment and

  • Management of Global Employees (MGE) query

    Hello all, We are implementing MGE (Management of Global Employees) functionality for Personnel Administration for a global company. I have activated all Global Employment (GE) switches in table T77S0 after consultation with SAP. We don't plan to imp

  • How to get Beats by Dre software on my Pavillion DV6-6170SD?

    Hello everyone, I  recently installed Windows 8 (64 bit) on my HP Pavillion DV6-6170SD and most of the drivers were working right away. I used to have Beats by Dre audio software before I updated my Windows and now I can't find it on the website belo