External Multi Way Merge Sort in Java!

Hi guys,
some month ago I've sent [this post|http://forums.sun.com/thread.jspa?forumID=31&threadID=5310310] to the Java Programming Forum to notify the availability of an External Multi Way Merge Sort written in Java, tuned and tested over GB of text data and under LGPL license, that can help you in dealing with huge logs or CVS files.
I want here just to notify you about this implementation.
You can run the sorting algo via command line or you can call from any Java class instantiating the ExternalSort class, setting its properties, and calling run().
Searching in the forum I saw many people that ask for an help in sorting huge textual files that doesn't fit in memory. My post solve this problem, but I think that very few people are aware of it :( ....
Hope to help who need this feature. See you!

Nice work this SmallText,
did not try it, but like the idea
I have been doing something similar in c (long, long time ago) and what made our compression really tight for typical csv files was decision to build one huffman coder for each field. The trick was to include field separator symbol into static huffman coder for every field, that provides you with End Of Field signal where to switch to next decoder.
for ExternalSort, I would suggest to make it multithreaded. Having so many many cores and low prices of ram today, it makes a lot sense. Easy way to do it is to have IO_Thread(s) and Worker_Threads (the first ones do read and write and second ones are for sorting).... this is powerfull as you utilize many cores and can easily extend some processing (like compression, token frequency counting or whatever...) via callbacks in each phase (chunking/sorting/merging)
again, congrats, really nice work

Similar Messages

  • How to call a C sort function to sort a Java Array.

    My name is David, I'm interning this summer doing some High Performance Computing work. I'm significantly out of my comfort zone here; I am primarily a network/network security geek, not a programming guy. I took one Java based class called problem solving with programming where we wrote like 3 programs in Java and did everything else in pseudocode and using a program called Alice http://www.alice.org/ to do things graphically. Learned basically no actual programming syntax. Also have done some self-taught perl, but only through one book and I didn't finish it, I only got about half way through it. So my expertise in programming are pretty much null.
    That being said, I currently am tasked with having to figure out how to make JNI work... specifically at this time I am tasked with writing an array in Java, and designing a C program that can be called by means of JNI to sort the array. I have chosen to work with the Merge Sort algorithm. My method of coding is not one where I write the entire thing from scratch, I don't particularly have a need to master languages at this point, rather I just need to make them work. I am interested in learning, but time is of the essence for me right now. So thus far what I have done is take sample codes and tweak them to meet my purpose. However, I currently am unable to make things work. So I am asking for help.
    I am going to paste 3 codes here, the first one will be my basic self-written instructions for JNI (Hello World Instructions), the second one will be my Java Array, and the third one will be my MergeSort function. I am not asking for you to DO my work for me by telling me how to manipulate my code, but rather I am asking for you to send me in the direction of resources that will be of some aid to me. Links, books (preferrably e-books so I don't have to go to a library), anything that you can send my direction that may help will be deeply appreciated. Thanks so much!
    JNI Instructions:
    /*The process for calling a C function in Java is as follows:
    1)Write the Java Program name. Eg. HelloWorld.java
    2)Compile it: javac HelloWorld.java
    3)Create a header file: javah -jni HelloWorld
    4)Create a C program eg. HelloWorld.java
    5)Compile the C program creating a shared library eg. libhello.so (My specifc command is cc -m32 -I/usr/java/jdk1.7.0_05/include -I/usr/java/jdk1.7.0_05/include/linux -shared -o libhello.so -fPIC HelloWorld.c
    6) Copy the library to the java.library.path, or LD_LIBRARY_PATH (in my case I have set it to /usr/local/lib.
    7)Run ldconfig (/sbin/ldconfig)
    8)Run the java program: java HelloWorld. */
    //Writing the code:
    //For the HelloWorld program:
    //In java:
    //You need to name a class:
    class HelloWorld {
    //You then need to declare a native method:
    public native void displayHelloWorld();
    //You now need a static initializer:
    static {
    //Load the library:
    System.loadLibrary("hello");
    /*Main function to call the native method (call the C code)*/
    public static void main(String[] args) {
    new HelloWorld().displayHelloWorld();
    //In C:
    #include <jni.h> //JNI header
    #include "HelloWorld.h" //Header created by the javah -jni command parameter
    #include <stdio.h> //Standard input/output header for C.
    //Now we must use a portion of the code provided by the JNI header.
    JNIEXPORT void JNICALL
    Java_HelloWorld_displayHelloWorld(JNIENV *env, jobject obj)
    //Naming convention: Java_JavaProgramName_displayCProgramName
        printf("Hello World!\n");
        return;
    }Java Array:
    class JavaArray {
         private native int MergeSort(int[] arr);
         public static void main(String[] args)
             int arr[] = {7, 8, 6, 3, 1, 19, 20, 13, 27, 4};
         static
             System.loadLibrary("MergeSort");
    }Hacked and pieced together crappy C Merge Sort code:
    #include <jni.h>
    #include <stdio.h>
    #include "JavaArray.h"
    JNIEXPORT jint JNICALL
    Java_JavaArray_MergeSort(JNIEnv *env, jobject obj, jintArray arr[],jint low,jint mid,jint high)
       jint i,j,k,l,b[10];
    l=low;
    i=low;
    j=mid+1;
    while((l<=mid)&&(j<=high))
        if(arr[l]<=arr[j])
           b=arr[l];
    l++;
    else
    b[i]=arr[j];
    j++;
    i++;
    if(l>mid)
    for(k=j;k<=high;k++)
    b[i]=arr[k];
    i++;
    else
    for(k=l;k<=mid;k++)
    b[i]=arr[k];
    i++;
    for(k=low;k<=high;k++)
    arr[k]=b[k];
    void partition(jint arr[],jint low,jint high)
    jint mid;
    if(low<high)
    mid=(low+high)/2;
    partition(arr,low,mid);
    partition(arr,mid+1,high);
    sort(arr,low,mid,high);

    You're doing OK so far up to here:
    Java_JavaArray_MergeSort(JNIEnv *env, jobject obj, jintArray arr[],jint low,jint mid,jint high)This is not correct. It is not what was generated by javah. It would have generated this:
    Java_JavaArray_MergeSort(JNIEnv *env, jobject obj, jintArray arr,jint low,jint mid,jint high)A 'jintArray' is already an array, embedded in an object. You don't have an array of them.
    So you need to restore that, and the header file, the way 'javah' generated them, then adjust your code to call GetIntArrayElements() to get the elements out of 'arr' into a local int[] array, sort that, and then call ReleaseIntArrayElements() to put them back.

  • Multi-lingual Translator component in java?

    hello all,
    I want to make a Multi-lingual Translator component in java
    which can be used to convert any existing web site(developed
    using any technology asp/jsp etc) to any foreign language
    viz german, spanish,italian and other major foreign
    languages.
    Please let me know how do i proceed in java with respect to
    the following?:
    1. It has to be developed as a component in java such that
    it can be included in any existing web site as a control
    which simply gives the users the option of converting
    either the current web page from english to any other
    foreign language or simply the entire web site from
    english to any other foreign language
         How to develop it as a component so that it can be
    independently merged into any existing web site(whether
    asp or jsp) as a control? which technology should i be
    using here EJB's or Applets or any other?
    I personally think it can be a applet because with EJB's
    container/application server is required.
    what do you all suggest?
    2. I don't want to use any free translators that are
    available freely on net these days, because a lot of them
    re-directs to their own translation page and includes
    their own banners and or advertisements etc., which may
    not be feasible with us as we have to include this
    utility on our company's web sites
    3. How much time it should take approximately to develop?
    4. If there's any free tool available however without the
    limitations as mentioned above in point 2, then i am not
    averse to using it please let me know if such a tool is
    available.
    5. Please also let me know if there exists already a multi-
    lingual component in java with source code freely
    available anywhere on net, then please give me the link.
    This will help me save a lot of time in developing it.
    Please let me know the answers to above of my queries and
    u"ll be doing a great deal of help here.
    Thanks in advance
    sd76

    JS supports UTF-8... assuming the browser has the proper fonts.
    // _lang.jsp
    <%@ page language="java" contentType="text/html; charset=UTF-8" %>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    <html>
    <head>
         <title>Language Test</title>
         <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    </head>
    <body bgcolor="#ffffff" background="" text="#000000" link="#ff0000" vlink="#800000" alink="#ff00ff">
    <%
    request.setCharacterEncoding("UTF-8");
    String str = "\u7528\u6237\u540d";
    String name = request.getParameter("name");
    // OR instead of setCharacterEncoding...
    //if(name != null) {
    //     name = new String(name.getBytes("ISO8859_1"), "UTF8");
    System.out.println(application.getRealPath("/"));
    System.out.println(application.getRealPath("/src"));
    %>
    req enc: <%= request.getCharacterEncoding() %><br />
    rsp enc: <%= response.getCharacterEncoding() %><br />
    str: <%= str %><br />
    name: <%= name %><br />
    <script language="Javascript">
    alert('<%= name %>'); // should show correctly
    </script>
    <form method="POST" action="_lang.jsp">
    Name: <input type="text" name="name" value="" >
    <input type="submit" name="submit" value="Submit POST" />
    </form>
    <br />
    <form method="GET" action="_lang.jsp">
    Name: <input type="text" name="name" value="" >
    <input type="submit" name="submit" value="Submit GET" />
    </form>
    </body>
    </html>

  • What's the best way to sort query results?

    Hello All,
    I have a standard CRUD UI that now needs the results sorted, based on input from the UI. What's the best way to sort results by entity fields?
    I'm familiar with the conventional Java methodology using a TreeSet and comparator. Is this the best route, does BDB JE offer more convenient or performant alternatives?
    I looked through the documentation and saw how to change the indexes, but I'm just looking to change the order in the scope of the query.
    If my application were an address book, the UI would be sortable, ascending and descending, by date added, first name, last name, etc. based on which column the user clicked on.
    Thanks in advance,
    Steven
    Harvard Children's Hospital Informatics Program

    Hi Steven,
    Using standard Java collections is probably the best approach.
    One thing that may be useful is to get a SortedMap (Primary or SecondaryIndex.sortedMap) for the case where you have an index sorted the way you want, and then copy the primary index into a TreeMap (with a comparator) for other cases. That way, you always have a SortedMap to work with, whether you're copying the index or not.
    I haven't tried this myself, just a thought.
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • A problem on merge sort

    This merge sort test can not run as expected.
    Can any one check it for me? Thank you
    Source code attached
    //MergeSort.java
    //application to show the mergesort
    public class MergeSort{
         public static void main(String [] args){
              if(args.length == 0){
                   System.err.println("Format: java mergeSort [number1] [number2]...");
                   return;
              int [] sortList = new int[args.length];
              try{
                   for(int i = 0 ; i < args.length ; i ++){
                        sortList[i] = Integer.parseInt(args);
              catch(NumberFormatException ex){
                   System.err.println("Input format error, only integer numbers before error been sorted");
              mergeSort(sortList, 0, sortList.length-1);
              System.err.print("Sorted list : ");
              for(int i = 0 ; i < sortList.length ; i ++)
                   System.err.print(sortList[i] + " ");
         public static void mergeSort(int [] sortList, int first, int last){
              if(first < last){
                   int mid = (first + last) / 2;
                   mergeSort(sortList, first, mid);
                   mergeSort(sortList, mid + 1, last);
                   merge(sortList, first, mid, last);
         private static void merge(int [] sortList, int first, int mid, int last){
              int indexLeft = first;
              int indexRight = mid + 1;
              int indexTemp = 0;
              boolean isEmpty = false;
              int [] tempList = new int[sortList.length];
              while(!isEmpty){
                   if(sortList[indexLeft] < sortList[indexRight]){
                        tempList[indexTemp] = sortList[indexLeft];
                        indexLeft++;
                   else{
                        tempList[indexTemp] = sortList[indexRight];
                        indexRight++;
                   indexTemp ++ ;
                   if(indexLeft > mid||indexRight > last)
                        isEmpty = true;
              while(indexLeft <= mid){
                   tempList[indexTemp] = sortList[indexLeft];
                   indexTemp++;
                   indexLeft++;
              while(indexRight <= last){
                   tempList[indexTemp] = sortList[indexRight];
                   indexTemp++;
                   indexRight++;
              for(int i = first ; i <= last ; i++)
                   sortList[i] = tempList[i];

    Use code tags.
    Does this look like a debug for free forum to you?
    What isn't working? A compile error? I see a syntax error in your code.
    What is the input, what is the output?

  • Group sort / merge sort

    Is there any way i can do group sort/merge sort thru PL/SQL.
    Output Example:
    number course
    1000 MATH
    SCI
    LANG
    1040 MATH
    SCI
    Can we do this?

    are you looking for this?
    here is example from emp table at sqlplus
    SQL> break on dno
    SQL> select dno,ename from emp where dno in(10,20)
      2  order by 1;
           DNO ENAME
            10 CLARK
               KING
               MILLER
            20 SMITH
               ADAMS
               FORD
               SCOTT
               JONES

  • Is there a way to create a Java class based on what's defined in a schema?

    I have a set of schemas that define what messages I will get as an XML stream (sent as UDP packets). Is there a way to parse the schemas into a set of java classes that match the fields in the schema? If so, next think would be to feed a string or byte array to instances of these objects and have a method that parses it and fills all the fields. But is the first part possible? Somebody linked http://www.cafeconleche.org/books/xmljava/chapters/ in another thread, but that seems to be for regular XML, not schema, unless I'm confused which is entirely possible, I'm new to XML and using it in Java in this way.

    For future reference, [Java API for XML Binding|http://www.dummies.com/WileyCDA/DummiesArticle/Building-Custom-Code-with-Java-API-for-XML-Binding-JAXB-.id-1489,subcat-BUILDING.html] (JAXB) allows for [generation of Java classes|http://www.oracle.com/technology/pub/articles/marx-jse6.html] from XML Schema (or even DTD) using its xjc compiler.

  • Is there a way to sort a playlist on my iPhone alphabetically by Artist?

    Is there a way to sort a playlist on my iPhone alphabetically by Artist?  I can do it when I view the playlist through iTunes on my computer, but when I view it on my iPhone's iPod, it is sorted by the date the songs were added.  I add the songs through iTunes, and I can sort it by Artist with no problem in the iTunes window.  However, when I look at the playlist through the iPod on my iPhone, the new songs are down at the bottom, no matter who the Artist is.
    If it matters I am running iOS 4.3.10 on an iPhone 4 and I have iTunes 10.4.0.80 on my computer.
    Thanks.

    Ottonomy, here's what to do.
    Connect your iPhone to your computer and open the iPhone's playlist in an iTunes window. Copy/Paste the entire playlist into a new(!) playlist in iTunes, not on your iPhone. Then resort the playlist. After you sort it, Copy/Paste it to a new(!) playlist on your iPhone. Now delete the old playlist on your iPhone and rename the new playlist the name you want it to be (probably the same name as the old playlist).
    The downside to this method is that you have to do it every time you add a song to the playlist. Unfortunately, I don't think there's another way.

  • An easy way to sort a HashMap by Key?

    Hi everyone,
    Is there any way to sort a HashMap by the key value using API 1.3.1?
    Thanks.

    Try using a TreeMap implementation instead. It'll maintain the keys in sorted order, as long as they have a Comparable implementation to tell the data structure how to sort them.
    If you've written your code properly, it'll be an easy change:
    // instead of this
    // Map myData = new HashMap();
    // use this.
    Map myData = new TreeMap();As long as you've used the Map as the reference type it's easy to switch.

  • Consuming External Web Services in Web Dynpro Java

    Hi All,
    I an trying to consume external web service in Web dynpro java using Adaptive Web Service Model.
    But getting below mentioned error while executing the web service
    Exception on execution of web service with WSDL URL 'D:\Web Service Project\CurrencyConvertor.asmx.xml' with operation 'ConversionRate' in interface 'CurrencyConvertorSoap'
    Steps i followed are as below:
    1. Created  Adaptive Web Service Model for this i select WSDL source as "Local File System or URL"
    In next step i select No logical destination radio button and click on next
    In next step, browse the WSDL file and successfully import the WSDL file.
    2. After successfully importing the WSDL file i wrote below code in Init method:
    WebModel modelweb = new WebModel();               
    Request_ConversionRate reqConversion = new Request_ConversionRate(modelweb);
    ConversionRate conversion= new ConversionRate(modelweb);
    reqConversion.setConversionRate(conversion);
    wdContext.nodeRequest_ConversionRate().bind(reqConversion);
    3.After that execute the model - code is given below :
        IWDMessageManager manager = wdComponentAPI.getMessageManager();
        try
          wdContext.currentRequest_ConversionRateElement().modelObject().execute();
          wdContext.nodeResponse().invalidate();
          wdContext.nodeConversionRateResponse().invalidate();
        catch(Exception e)
          manager.reportException(e.getMessage(), false);
    Please let me know how to resolve this.
    Thanks
    Sandy

    Hi,
    You need to use destinations for metadata and modeldata.
    Configure those destination in Visula admin.
    you can refer to following link.
    https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/docs/library/uuid/b2bc0932-0d01-0010-6d8e-cff1b2f22bc7
    Regards,
    Shruti.

  • Is ther a way to sort apps combatible with 1st generation Ipod touch.

    Is there a way to sort apps on itunes, 1st generation Ipod touch compatible apps?

    Sorry but no.

  • Is there any way to sort songs in iTunes library between mp3 versions and m4a versions.  Any help/guidance would be appreciated.

    Is there any way to sort itunes library between m4a and mp3 versions of songs.  If there is, any instructions and guidance would be most appreciated.  Thanks in advance.

    Hey, here's an article I found. It might help:
    Converting your iTunes purchases to another format:
    http://support.apple.com/kb/HT1550

  • Is there a way to sort by date imported?

    Hi everybody,
    Is there a way to sort photos by the date they were imported, rather than by EXIF data?
    If iPhoto 6 does not offer this option, does iPhoto '09?
    The reason I ask is that not all my photos are from cameras. Some are scanned, and others are from e-mails. I'd also like the ability to simply ignore EXIF data since so many cameras are set to an incorrect date.
    Thanks beautiful people!

    No there isn't.
    However, you can correct the dates on shots taken with cameras (Photos -> Adjust date and time) and add dates to scanned photos.
    Regards
    TD

  • I cannot find a way to sort the bookmark folders themselves alphabetically by name.I am not talking about in a view mode but in the way they are displayed when I click on my bookmarks tab. Can someone explain to me how to accomplish this.

    I have a lot of various book mark folders with websites contained within each folder. I am able to sort the websites within each folder alphabetically by name but I cannot find a way to sort the bookmark folders themselves alphabetically by name.I am not talking about in a view mode but in the way they are displayed when I click on my bookmarks tab. Can someone explain to me how to accomplish this other than manually dragging them as this is extremely hard for me due to the fact that I am a quadriplegic with limited hand movement dexterity

    Bookmark folders that you created are in the Bookmarks Menu folder. "Sort" that folder.
    http://kb.mozillazine.org/Sorting_bookmarks_alphabetically

  • Is there a way to sort my songs by import dates?

    Is there a way to sort my songs by import dates? The options i have now is only to sort by names and artist. I want to be able to see the songs i added recently and also play in that order.
    Thank you

    Yes. It's called Date Added. In Song view enable the column by pulling down View > View options and select "Date Added."

Maybe you are looking for