Classify and cluster documents with algorithm in java

Hello everybody,
I want to classify and cluster, let's say, 100 documents which are on the Intranet.
I was thinking of including an algorithm in java who can do this.
But i have no clue where to start of where to begin.
Please give me some advice or maybe an example ;),
Greetings,
Thijs Leusink
[email protected]

I was thinking of including an algorithm in java who
can do this.
But i have no clue where to start of where to begin.Two steps:
1. Find an algorithm that can do what you want.
2. Write it in Java.
Start with the first step: if you haven't done that yet, I don't think we can help you here. Possibly you could use Google to find an algorithm.
Then go on to the next step: if you don't even know how to begin writing a Java program then there is a collection of introductory tutorials here:
http://java.sun.com/docs/books/tutorial/
If you are having problems after that, come back here and ask a specific question.

Similar Messages

  • Reading and displaying Ms.Word document with web dynpro java

    Hi,
    I'm using NetWever developer studio 7..0.21.
    I'm developing web dynpro java application.I'm in difficulty to read and display word document with its original format in web dynpro view. Is it possible?
    If possible , is there a blog etc.?
    Thanks.

    Hello,
    You have to use the Office Integration Library. Please, follow the documentation below:
    http://help.sap.com/saphelp_nw04/helpdata/en/c3/32853febec3c17e10000000a114084/frameset.htm
    I hope this helps you.
    Regards,
    Blanca

  • Differences between billing and accounting document with down paym. request

    Hello to all
    We are creating down payment request billing from one sales order with two sales order items that has different tax classification material.
    The billing document is created correctly (with the same value for each item), not as the account document.
    If you compare the value from the billing document to the account document is not the same. In the account document the system decreases the amount of the first item with the tax amount of the second item and the same for the second item (the amount of the second item decreases the amount of the second item with the tax amount of the first item).Then the tax amount of the two items is right.
    If the two items have the same tax classification material, it works fine.
    Can anyone help me?, is it a problem with the payment request billing?.
    Thanks in advance.
    Edited by: tonnetti pablo on Jan 8, 2008 9:46 AM

    Hi,
    The following Customizing settings have to be made for down payment processing:
    Settings for the billing plan – To activate the billing plan function, maintain the materials, for which you wish to process down payments, with item category group 0005 (milestone billing). This gives the item type TAO via item type determination. The item type TAO calls up the billing plan function.
    You need to implement the following activities in the billing plan for down payments:
    Maintain deadline category – This determines the billing rule (percentage or value down payment) for the down payment request. The system assigns billing type FAZ (payment request) defined in the standard system with billing category P. (For the billing type FAZ there is the cancellation billing document type FAS in the standard system).
    Maintain the deadline proposal – Use the down payments that are due for the proposed deadlines.
    Maintaining a Pricing Procedure with the Condition Type AZWR:
    In the standard system the condition type AZWR is delivered for the down payment value already provided but which has not yet been calculated. You must include this condition type in the relevant pricing procedure before output tax.
    Enter condition 2 (item with pricing) and the calculation formula 48 (down payment clearing value must not be bigger than the item value) for the condition type AZWR.
    Before the condition AZWR you can create a subtotal with the base value calculation formula 2 (net value). If the condition AZWR is changed manually, you can get information on the original system proposal from the subtotal.
    Maintain the printing indicator – The pricing procedure can not be marked as a transaction-specific pricing procedure (field Spec.proc.) The condition type AZWR has the calculation type B (fixed amount) and the condition category E (down payment request / clearing).
    Maintaining the Billing Document – In the standard system there is the billing type FAZ (down payment request) and the billing type FAS for canceling . The down payment is controlled using the billing category P of the billing type. A billing type becomes a down payment request when the billing category P is assigned. You have to maintain blocking reason 02 (complete confirmation missing) for the billing documents and assign it to billing type FAZ.
    Copying control – Copying requirement 20 must be entered in copying control at item level for the down payment request. In the standard system the order type TA for copying control is set up according to the billing type FAZ for the item category TAO.
    Copying requirement 23 must be entered in copying control at item level for down payment clearing. In the standard system the order type TA for copying control is set up according to the billing type F2 for the item category TAO.
    Financial Accounting settings – A prerequisite for down payment processing is that the account is assigned to the underlying sales document. To do this, change the field status settings in Customizing as follows:
    Set reconciliation accounts (transaction OBXR) – For the `received down payments’ and `down payment requests’ from
    the G/L accounts you have selected, you should assign the field status definition G031.
    Maintain accounting configuration (transaction OBXB) – For the down payments (posting key ANZ in the standard system) and the output tax clearing (posting key MVA in the standard system), you must maintain the posting key.
    You must also carry out a G/L account number assignment for the tax account.
    Maintain the posting key (transaction OB41) – For posting key 19, set the sales order as an optional field
    Maintain the field status definition (transaction OB14) – For field status variant 0001, field status group G031, set the
    sales order as an optional field
    Assign the company code to the field status variants (transaction OBC5)
    Regards,
    Siddharth.

  • Highlight File Format and PDF Documents with Chinese and English characters

    I'm a developer working on an application that makes use of the Highlight File Format / external highlight server capabilities of Adobe Reader.
    The highlighting worked correctly until we started to introduce pdf documents that were scanned to recognize Chinese in addition to English.
    The xml file seems to have the correct values in it. For example, if the 10 characters to highlight are at position 41 on the first page. The "xml" file has the <loc pg=0 pos=41 len=10>.
    If the document is scanned for English only, it works fine. That is, the highlight starts at character 41. If the same document is scanned for Chinese and English, the highlight starts at character 22.
    Has anyone had a similar experience? Do you know a solution?

    Hi,
    I don't know about BIP and the specifics of your context, but here are some general answers for the XLIFF format:
    <?xml version = '1.0' encoding = 'utf-8'?>
    Can we change encoding to 'ISO-8859-1' as soon as we convert the file format ?If you also save the file to ISO-8859-1 as well yes. But you can do this only for languages supported by Latin-1. It makes sense to keep the files in UTF-8.
    A-2) he section <header><skl><internal-file> contains a huge string
    which seeem to be binary ... What is this ? can we delete it ?That's likely to be the skeleton file. The data used to rebuild the original format after translation.
    Most likely it should stay there.
    A-3) Can we have one XLF file with muliple <file> sections (one per language to translate to ) ?
    This can be very useful for us to manage one only translation file per report template.Yes but no: yves you can have several <file> elements in an XLIFF document, but no: they must be for the same language pair. XLIFF is designed to work with bi-lingual files, not multi-lingual files.
    A-4) the most important section for translation is included in the <trans-unit> tag .
    Each one has a distinct id like ""49e41f8f" ... Can we replace this by a more meaningfull value ?Those id attributes are used by the filter to merge back the data after translation. You should preserve them.
    A-5) the language format is like "en-US" (language code + territory code).
    Is it case sensitive ? No it is not case-sensitive (en-us == en-US). The values of xml:lang are not case sensitive (unlike other XML attribute values).
    en-US is just the recommended notation.
    The XLIFF specification are here:
    http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html
    Hope this helps,
    -ys

  • I am unable to attach scanned documents or Pictures to eamils. The size of the scanned documents and the pictures are well below the 20MB limit. I can connect Exl and Word documents with not problem

    When trying to attach a scanned document or picture to and email, the process goes not forever and there is no end, it just keeps working and tries to attache these documents/pictures. In the lower left hand corner of the screen it says "waiting for mail. yahoo. com"

    Purplehiddledog wrote:
    I do backup with iCloud.  I can't wait until the new iMac is available so that I can once again have my files in more than 1 location without needing to rely solely on the cloud. 
    I also rely on iTunes and my MacBook and Time Machine as well as backing up to iCloud. I know many users know have gone totally PC free, but I chose to use iCloud merely as my third backup.
    I assume that the restore would result in my ability to open Pages and Numbers and fix the problem with deleting apps, but this would also mean that if my Numbers documents still exist solely within the app and are just not on iCloud for some reason that they would be gone forever.  Is that right?
    In a word, yes. In a little more detail.... When you restore from an iCloud backup, you must erase the device and start all over again. There is no other way to access the backup in iCloud without erasing the device. Consequently, you are starting all over again. Therefore, it would also be my assumption that Pages and Numbers will work again and that the deleting apps issues would be fixed as well.
    If the documents are not in the backup, and you do not have a backup elsewhere, the documents could be gone forever.

  • Split and Rename Document with Bookmark?

    Hello,
    I don't know if this is possible, but I figured this would be the place to find out. What we would like to do it split a multi page PDF document into single pages. The individual page PDF files would be renamed to it's respective bookmark in the multi page document. Each page has it's own bookmark in the multi page document so there wouldn't be any name conflicts.
    Has anyone ever heard of this, or better yet, have a script to do this? I would imagine this would be a challenging script to write, if it's even possible. If anyone with Acrobat scripting know how could please help, I would greatly appreciate it. Any type of assistance with this would be awesome!
    Thank you in advance,
    Danny

    Something like this, perhaps?
    http://try67.blogspot.com/2008/12/acrobat-extract-chapters-by-bookmarks.html

  • Documents with Smart Objects - Very slow to open and Save - CS6 Photoshop

    When opening and saving documents with smart objects photoshop freezes the adobe PS loader (circle dots) is replaced and the system loader (multi colored wheel of death) spins for 30 seconds or more.
    What I've tried so far based off looking at various posts.
    Photoshop Preferenes
    Save in Background off
    Maximise PSD and PSB file compatability never
    Cache Tile Size: 128k
    Advanced Graphic Processor Settings: Basic & Normal
    Layer Panel options: No Thumbnail
    Observations and workthroughs to date
    The file size and amount of smart objects effects the file expotentially i.e. The more smart objects you have the worse it gets
    These files worked perfectly in PS CS5
    It also happens on files natively created in PS CS6
    The CPU is maxing out at 100% while PS loads
    Closing or opening suitcase has no effect.
    System:
    iMac 27-inch, Mid 2011
    Processor  3.4 GHz Intel Core i7
    Memory  16 GB 1333 MHz DDR3
    Graphics  AMD Radeon HD 6970M 1024 MB
    Mac OS X Lion 10.7.5 (11G63)
    Suitcase 4
    Anyone got any ideas? This is making me go nuts!

    A solution!
    It turns out the problem in my case was in fact Suitcase. Previously, I'd tried turning it off, but that didn't fix the problem, so this time, I uninstalled it completely and the problem disappeared. I then began re-adding it (installed 15.0.1, upgraded it, etc.) and the problem resurfaced with the addition of the Photoshop-specific plugin. Deleting that plugin solved the problem. So it seems that "disabling" Suitcase by stopping the TypeCore doesn't seem to actually disable all of the tentacles it sticks into your system.
    You can find the plugin here: Applications / Adobe Photoshop CS6 / Plug-ins / Automate / ExtensisFontManagementPSCS6.plugin
    (After a restart, I also had to delete the font cache, as described here http://helpx.adobe.com/photoshop/kb/troubleshoot-fonts-photoshop-cs5.html but your mileage may vary.)
    Alternately, if you don't want to delete the plugin, disabling it from within Photoshop seems to work as well. To do that, go to File > Automate > Extensis, click Preferences..., then deselect Enable Suitcase Fusion 4 Auto-Activation.
    Fortunately, the plugin doesn't seem necessary at all to use the the core functionality of Suitcase (enabling and disabling fonts) in Photoshop. I didn't even know what these app-specific plugins did until researching this problem, and I still don't quite understand the point of them. I guess they allow you to let the apps for which they're installed do a little bit more of their own management (enable a font via Suitcase that isn't enabled system-wide), but that seems like more control than I need--if I'm enabling a font, I want all my software to be able to use it.
    Anyway, the problem seems to be completely solved on my system now, though I just did all this, so more testing over the next few days is required. I'll post here if any issues crop up. I'm interested in hearing if this solves it for anyone else as well.

  • Need information about FI document  with header and no items

    Hello Experts!
    I need your help for understanding the following subject :
    How is it possible when using F.13 to obtain a fi document with header without items sometimes,
    and fi document with header and items anothers times?
    We didn't saw/found customizing topics about that ( to enable or not header without items) Some exists?
    The only difference in the two cases are the document's type involded which are not the same.
    Regards,
    Josiane

    Hi,
    We verify  and compare everything.  it's the same everywhere except the document type of the invoice
    first case, F.13 create conciliation fi document with header and no items
    invoice ( doc. type YB )
    payment ( doc type ZT)
    conciliation ( doc type XZ )
    2th case, F.13 create conciliation fi document with header   and none items
    invoice ( doc. type YT )
    payment ( doc type ZT )
    conciliation ( doc type XZ )
    We match / compare  every fields and value  between invoice of 1st and 2tch case,  every fields and value between payment  of 1st and 2th case.
    ( we create  tests case in own integration system for reproduce  )
    We alse match / compare customizing of invoice type document YB and YT and  they seems identical...
    The procedure is the same with F.13 to do conciliation in both cases.
    we got same amount, same currency etc etc.
    We still didn't understand why with invoice type YT the conciliation document created is not the same as invoice type YB ...
    Nobody have idea ?

  • Clearing Control-CL document with FICA items

    Hi Friends,
    I have the following scenario.
    Actual:-
    Order with Document type AA(My Invoice) and payment method V.
    1. Customer has paid with w.r.t to Business partner number info.
    2. Payment was posted on the Account and the document with document type ZU is posted
    3. With clearing system generates 3 CL documents.
    I have no idea why the system generates 3 CL documents instead of creating one CL clearing document.
    Correct will be:-
    Now I have analysed a similar scenario in other system it worked perfectly without CL documents. Single CL clearing document was created
    RE-Invoice
    ZU- Account posting and no direct clearing.
    FPMA:- A CL clearing document is created.
    CL document is visible only in clearing data at the document level.
    Can you throw some light on it.
    Thanks in advance!
    Lakshmi

    Hi Amlan,
    I am glad that you have looked into the issue.
    Sorry, may be I have n't explained clear enough. Under the Title
    Actual, I meant that
    1. By clearing Open Invoice and Credit note on the Account, my understanding would be a CL document gets created as attached in the last screenshot without any CAX items.
    But in my First screenshot a new CL document with FI-CA items is created in it.
    Here the document class at the Header level is " "space.
    why is a new Clearing document created with FPMA with FI-CA items. This CL document in table DFKKOP has Document class as " " and does not have 1 saying that there are no FI-CA items.
    I did not understand the background of it.
    Expected is,
    I expected a new CL document without FICA items and as document class saying that there are no FICA items like in Last screenshot.
    Similar to that of Order 100130067.
    Thanks a ton and would be grateful for a reply.

  • BOM Details in to a Purchase and Sales Documents

    Hi
    How to get BOM Details in to a Purchase and Sales Documents (With COmponentItems Details).

    Hi Chakrapani Bandaru,
    According to our knowledge, only BOM of 'Sales' type can be displayed with its components together in Sales documents. But this kind of BOM can not be selected in Purchase documents since they can not be 'Purchased Item' (Item Master Data).
    Additionally, BOM of 'Template' type can also display its components in Sales/Purchase documents, but in a different way. The parent and its children are parallel in the documents.
    Regards,
    Candice Ren
    SAP Business One Forums Team

  • Link Between delivery and material document

    Dear All,
    how do we link between delivery document,
    and between delivery document and accounting document,
    with regards
    Mohammed Raees

    Hi there,
    Material doc is the PGI document which is created whenever there is material movement either GR or GI. In VL02N --> document flow, select the GD goods issue & click on display document. It will take you to the material doc. There is a field called material slip in that. The value stored in material display is the delivery num. That is how you link the delivery & material doc.
    From the material doc if you click on accounting docs, it will give you the A/c document num & the costing doc num (if available).
    From the material doc, select the item & click on details from ityem, it will give you the GL account for which the GI is posted.
    Regards,
    Sivanand

  • Document with all the changes to the JLS introduced by JDK 5.0

    Do you know if there is document with all the Java Language Specification changes introduced by JDK 5.0? I think I've seen such a document somewhere on the java.sun.com website but I can't remember where.
    I wish there was an updated version of JLS.
    Thank you

    I found it. It is:
    http://java.sun.com/docs/books/jls/jls-proposed-changes.html

  • Generating word and PDF documents

    How can I generate Word and pdf documents with JDeveloper ?
    Iwan

    You already got answers on your previous thread:
    Create *.doc and *.pdf files with jDeveloper

  • Api POI read and replace my word document with java

    Hi
    Everyone
    I�d like to know if someone has a piece of code reading a word document with api POI to send me?
    I need read replace the word document wiht java j2ee.
    Thanks

    Hello,
    You have to use the Office Integration Library. Please, follow the documentation below:
    http://help.sap.com/saphelp_nw04/helpdata/en/c3/32853febec3c17e10000000a114084/frameset.htm
    I hope this helps you.
    Regards,
    Blanca

  • Problem in printing pdf document with java code

    Hi All
    I want to print a pdf document with java code i have used PDFRenderer.jar to compile my code.
    Code:
    File f = new File("C:/Documents and Settings/123/Desktop/1241422767.pdf");
    FileInputStream fis = new FileInputStream(f);
    FileChannel fc = fis.getChannel();
    ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
    PDFFile pdfFile = new PDFFile(bb); // Create PDF Print Page
    PDFPrintPage pages = new PDFPrintPage(pdfFile);
    // Create Print Job
    PrinterJob pjob = PrinterJob.getPrinterJob();
    PageFormat pf = PrinterJob.getPrinterJob().defaultPage();
    pjob.setJobName(f.getName());
    Book book = new Book();
    book.append(pages, pf, pdfFile.getNumPages());
    pjob.setPageable(book);
    // System.out.println(pjob.getPrintService());
    // Send print job to default printer
    pjob.print();
    but when i am running my program i am getting error
    Exception in thread "main" java.awt.print.PrinterException: Invalid name of PrintService.
    Please anybody, knows the solution for this error?
    Thanks In Advance
    Indira

    It seems that either there is no default printer setup or you have too many printers or no printer setup at all. Try running the following code. It should print the list of available print services.
    import java.awt.print.*;
    import javax.print.*;
    public class PrintServiceNames{
         public static void main(String args[]) throws Exception {
              PrintService[] printServices = PrinterJob.lookupPrintServices();
              int i;
              for (i = 0; i < printServices.length; i++) {
                   System.out.println("P: " + printServices);
    }From the list pick one of the print service names and set it explicitly like "printerJob.setPrintService(printServices);" and then try running the program.

Maybe you are looking for