Complexity of searhing in binary files

Hello!
I have question regarding complexity when searching in a binary file. I have a app that needs to perform a search for a certain string in a large binary file (> 1Gb). For the moment, I read one line of the file into a byte array, convert it to a string and than look for any matches with indexOf. This, however, is (yeah, you guessed right..) an extremly slow search algorithm. By the time I wrote it was a good option, but now I am thinking of re-making it.
So my question, is it any idea to try something else? Will it be faster? Is it even possible? Can one use regexp in binary files? A bit confused in this matter...
All ideas are welcome!

Yes, it true. ASCII characters are readable. Maybe i
should have used another word for "line". 1 line = 1
frame ie. each frame have a line feed attached.
The tricky part is that each frame contains a 4byte
preamble followd by a unique identifier, which I must
keep track of. That is, a search hit in one frame
should return the identifier of the frame.
Thanks for the tip about the search algorithms!
I bealive the qustion is: How do I find a specific
string in a byte array?Convert the search string to a byte array and then search for that byte array in the file byte array. This way you are only converting once.

Similar Messages

  • Write output to binary file in C/C++?

    How can I write output to a file in binary format instead of ascii?
    Or... if I am using a buffer.. to write a buffer to binary format instead of ascii?
    I am writing a lot of data to a text file and I would like to reduce the size by writing it all into a binary file instead of ascii?
    Last edited by kdar (2012-12-13 14:36:41)

    Basic I/O?
    char buffer[100];
    // C++ style
    ofstream file;
    file.open ("data.bin", ios::out | ios::binary);
    file.write(buffer, sizeof buffer);
    /* C style */
    FILE* file2 = fopen("data2.bin", "wb");
    fwrite(buffer, sizeof buffer, 1, file2);
    A lot of people prefer the old C style functions for binary i/o even in C++ (and iostream for formatted i/o). Writing and readong more complex structures requires some castin, in C++ usually done with reinterpret_cast<char*>(someStructure).

  • "Read From Binary File" function Help ambiguity

    I must be getting tired, but for some reason a doubt crept in my mind as I was designing a new piece of code this morning:
    "is the "Read From Binary File" using the last file position or is it starting from the beginning of the file?"
    "That's a stupid question", I told myself.
    "I used this function a million times and have always assumed it is reusing the last file position. Moreover, there is no file offset input to that function, so WTH am I afraid of?"
    So, for kicks, I fired up the Help window and read the following description (*):
    Reads binary data from a file and returns it in data. How the data is read depends on the format of the specified file. This function does not work for files inside an LLB.
    (*) BTW, has anybody ever complained that you can't select and copy anything from the floating Help Window?
    Not much there. I particularly admire the phrasing of the second sentence... What about: "This function can do a lot of things, but it would much to complex to describe this is extensive details, so if you are asking, you probably can't afford using it"?
    Anyhow, I clicked on the "Detailed Help" and got this (among other things):
    Use the Set File Position function if you need to perform random access.
    WHAT? I am pretty darn sure I DO NOT USE the Set File Position when I read a file in successive and contiguous chunks. I just pass the file refnum into a shift register and back to the function and that's it.
    Now, the description of the "Refnum Out" ouput says: If file is a refnum or if you wire refnum out to another function, LabVIEW assumes that the file is still in use until you close it. Translated in plain English, is that supposed to mean that if the file is not closed it is open, or is that implying that it contains more info that just "the file is open and can be found here"?
    I started searching around and finally ended up with the entry for "refnums, file I/O". Down the bottom of the (long) article, I found this under the heading "References to Objects or Applications" (but nothing specific to files, BTW):
    ...LabVIEW creates a refnum associated with that file, device, or network connection...
    [...]  LabVIEW remembers information associated with each refnum, such as the current location for reading from or writing to the object and the degree of user access, so you can perform concurrent but independent operations on a single object. If a VI opens an object multiple times, each open operation returns a different refnum. LabVIEW automatically closes refnums for you when a VI finishes running, but it is a good programming practice to close refnums as soon as you are finished with them to most efficiently use memory and other resources.
    So it seems that my recollection was correct. I do not know what the "degree of user access" for a file is, but that's not the topic of today's post. 
    So, my point is: the Help File for this function is incomplete or ambiguous at best. Please correct it. And provide a link to the "refnum, file I/O" Help entry in its detailed Help. It would H E L P...
    Thanks for reading.

    Reading in succesive chunks is *NOT* random access. An open file always has
    a current position, which is updated with each read or write operation.
    You only need to set the file position if you want to start elsewhere.
    LabVIEW Champion . Do more with less code and in less time .

  • Read binary file in as string to perform RegEx

    I have a python script that I use to gather data from a particular data file. I want to convert the script into a java application so I can add a gui and make it more distributable. I am particularly struggling on how to read the binary file into java and convert it to a string for the regex searching.
    import re
    dat = 'c:/projects/test.dat'
    all_the_data = open(dat,'rb').read()
    pattern = 'title="(.*?)".*?value="(.*?)"'
    rx = re.compile(pattern, re.IGNORECASE|re.MULTILINE|re.DOTALL)
    result = rx.findall(all_the_data)
    for title, value in result:
        print "%-25s: %s" % (title, value)

    Hi infraray.
    Check my post out: Cyclomatic Complexity Using Regex .
    This will help.
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    import java.io.*;
    1) static BufferedReader keyboard = new BufferedReader(new InputStreamReader(System.in));
    2) protected String txtFileName;
    3) System.out.println("Enter file name to be read: " );
    4) txtFileName = new String(keyboard.readLine());
    5) reader = new BufferedReader(new FileReader(txtFileName));
    // Your Pattern |
    V
    6) Pattern pattern = Pattern.compile("if|for|while|case|switch",Pattern.MULTILINE);
    7) Matcher m = pattern.matcher(txtFileName);
    8) boolean b = m.matches(); // return true if match found !
    9) String line = null;
    10) while((line = reader.readLine()) !=null)
    m.reset(line);
    if(m.find())
    // Change line below to suit your need
    System.out.println("Your message " + " found " + " start of line: " + m.start() + " ends at line: " +m.end());
    reader.close(); // close buffered reader!
    Hope this gives you some direction.

  • Binary file of unknown formats

    Hello everybody,
    I posted a message earlier today but unfortunately I missed to give more details about my problem. I am sorry about that. I am a new user of LabView so I still have many things to learn. What I am trying to do is use LabView (8.2) to read a file that was created by UVP Monitor, V3.0 (*.mfprof file).
    This package saves fluid velocity information. There are 65 channels that form a velocity profile in a pipe. Also, there are 4096 velocity profiles, each velocity profile was captured at a time rate of 9 ms. I don’t know what is the format of these values.
    At present, I obtain a text file from UVP Monitor, then process the data using FORTRAN and finally plot the results. I want to avoid this and read the UVP file, process, analyze and plot the data using only LabView. I want to read individual experimental parameters as well as the bulk of the data. In this way the user has different options to choose from before processing and plotting the results. I actually built a VI that processes and plots the text files and it runs great.
    The problem I am facing now is that I want to read the UVP file but I have no information about the structure of it. I am assuming it is a binary file but that is all I can say about it.
    The header and footer seem to be in text format, that is all I can read; however, the body of the file I am not yet able to read.
    The information from the file is sent to the type cast function. Then I set a string indicator in the “Normal” option in Properties. I can read the header and footer but not the body of the file; for the body of the file all I can see is characters of the type
    “C¾ Ü Qþ  X@ “. One of the other options, Password, seems to provide an output of just the characters that were originally written, but then again, I can not read the characters because it is only asterisks and I can not copy them onto a word processor either.
    I have tried other options as well, such as Variant to flattened string and flattened string to variant, flatten to string and unflatten from string. Unfortunately none of them worked.
    I am attaching the VI I wrote and a sample UVP file , I would appreciate if someone could explain to me how to deal with  complex binary files in LabView and share an example with me if possible.
    Regards,
    Roberto
    Attachments:
    BINARY Read UVP file.vi ‏45 KB
    Test UVP file.zip ‏693 KB

    OK, here's a quick draft how you could do it in LabVIEW. Modify as needed, e.g. slice out the interesting columns, etc.
    The header has some binary data that may, or may not, mean something useful.
    See if the data "looks" about right.
    Message Edited by altenbach on 04-26-2007 08:13 PM
    Message Edited by altenbach on 04-26-2007 08:15 PM
    LabVIEW Champion . Do more with less code and in less time .
    Attachments:
    BinaryRead.png ‏10 KB
    BINARY Read UVP file2.vi ‏46 KB

  • Siebel 8.0.0.12 Fix Pack; Unable to get the seed from binary file.

    Hello Folks,
    Can anyone throw some light into what action is required on my scenario.
    I have applied Fix Pack Siebel 8.0.0.12 on top of 8.0.0.11 SBA. After it is appled, I am facing a documented issue within the Release Notes for the 8.0.0.12 Fix Pack
    The issue is "UNABLE TO LAUNCH URL AFTER APPLYING SIEBEL 8.0.0.12". I tried the steps given with the MR document, however, I am still having this issue.
    I am also not sure what is expected at the step of; Run the following command: seedgeneratorutil myseed.dat abcdef .
    It's asking me for a value to enter for seed at command prompt. "Enter the seed":
    what I should give here. As an assumption values,I gave SADMIN and tried to launch but still shows up the same error
    Please Assit
    Steps Details from Release Notes:
    UNABLE TO LAUNCH URL AFTER APPLYING SIEBEL 8.0.0.12
    Component: Server Infrastructure
    Subcomponent: SWSE
    Product Version: Siebel 8.0.0.12
    Base Bug ID: 11938270
    **Users are unable to launch the URL after applying the Siebel 8.0.0.12 Fix Pack.
    **Use the following workaround to address this issue:
    Navigate to the eappweb/bin directory from the command line on the SWSE installation.
    Run the following command:
    seedgeneratorutil myseed.dat abcdef
    NOTE: In the example, myseed.dat is a filename. You can give any file name you wish.
    The myseed.dat file is generated in the eappweb/bin directory.
    Edit eapps.cfg to include the following parameters under the SWE section:
    seedfile = < complete path for myseed.dat >
    Bounce the web server.
    (For Linux only) Copy libmod_swe.so from the eappweb/bin folder to the web/ohs/modules folder
    Thanks
    Kumar

    Wilson,
    Thanks for your reply.I have repeated the steps and regenerated the error messages.
    Browser
    Message:
    An error occurred while trying to process your request. This error indicates a problem with the configuration of this server and should be reported to the webmaster (along with any errors listed below). We apologize for the inconvenience
    Initialization error:
    Unable to get the seed from binary file.
    Log
    2021 2011-09-20 23:23:01 0000-00-00 00:00:00 +0530 00000000 001 003f 0001 09 ss110920_7068 7068 7852 E:\sba80\SWEApp\log\ss110920_7068.log 8.0.0.12 [20444] ENU
    ProcessPluginState     ProcessPluginStateError     1     000000024e781b9c:0     2011-09-20 23:23:01     7852: [SWSE] Unable to get the seed from binary file.
    Eapps.cfg
    [swe]
    Language = enu
    Log = errors
    LogDirectory = $(SWSERoot)\log
    ClientRootDir = $(SWSERoot)
    SessionMonitor = False
    AllowStats = true
    LogSegmentSize = 0
    LogMaxSegments = 0
    DisableNagle = False
    seedfile = E:\sba80\SWEApp\BIN\80012seed.dat
    Thanks
    Kumar

  • Report using Binary file

    Hi All , can anyone help me here ...
    I used “Write to Binary” to  write my report , at first stage this file is writing headers , including column  headings . at 2nd writing stage it is writing  data in columns.  This file can be viewed in .doc or .xls format. I have three issues
    1.         If I want to print this file from Front Panel  , what I should do?
    2.         other thing is if I stop logging and then start for another span , it starts writing from the first line instead of from last line  , because of this problem it is causing over-writing .
    3          another thing  , lets say if for 2nd or 3rd logging , I want to display sub-headings (test-1 , test-2) , how to insert , for example :
    MAIN HEADING (COMPANY NAME , REPORT TITLE )
    DATE , PRODUCT INFO , OPERATOR NAME
    COL1              COL2              COL3              COL4
    Subheading (test-1)
    Value1             Value2             Value3             Valye4
    Value1             Value2             Value3             Valye4
    Value1             Value2             Value3             Valye4
    Subheading (Test-2)
    Value1             Value2             Value3             Valye4
    Value1             Value2             Value3             Valye4
    Value1             Value2             Value3             Valye4
    Any help in this regard will be highly appreciated.
    Regards
    Faiyaz
    Attachments:
    report-writing-binaryfile.vi ‏68 KB
    02-display-subVI.vi ‏12 KB

    Hi Faiyaz,
    1. How Do I Print a File Programmatically From LabVIEW?
    2. Please the Programming >> File I/O >> Advanced File Functions >> Set File Position function on the block diagram after the Open/Create/Replace File function and set the from input to "end." This will append new data to the end of the file.
    3. You can simply add that information to the Format into String you are writing to the second Write to Binary file.
    Michael K.
    | Michael K | Project Manager | LabVIEW R&D | National Instruments |

  • How to open and read binary files?

    How do I open and read Binary files?

    Did you  look on The Unarchiver's web site where it has a link to older versions? http://theunarchiver.googlecode.com/files/TheUnarchiver3.2_legacy.zip
    The best thing to do is ask your friends what programs they used to produce these files, or at least what format files they are producing.  Otherwise it's like being shown a car and given a bundle of 200 keys with no idea to which one to use, or even if any of them work with that car.
    Using The Unarchiver will likely not do anything because it too will not know what format files are involved, and they may not even been in an archived format.  If they sent you a Word file without telling you (a favorite of Windows users to do  -- it drives me crazy when they could have just sent them in plain text), The Unarchiver won't open them.  If it's a picture file then using Hexedit will just show you a bunch of unintelligible stuff as shown in an earlier post, though you may see a line of text providing a hint.
    As I said earlier, often .bin may be an executable program which needs another program to actually interpret it.  That's what Java is trying to do.  Still, it may think it can execute the file, but it is highly unlikely somebody would send you an executable program (and if they did I would not trust it).  For all you know it may be a Windows virus.

  • How do i disable to pop up asking me if i want to save or cancel the binary file i am trying to download?

    everytime i download a show or movie from the internet a pop up asks me:
    "you have chosen to open xxxxxx which is a: Binary File from: httpxxxxx would you like to save this file - SAVE or CANCEL"
    this never used to happen on the older versions of firefox. it is so annoying - is there any way to turn it off?
    i am running mac os 10.5.8 and no, there is no option to click a 'don't ask me again' feature in the pop-up dialog.

    I'd first try downloading an installer from the Apple website using a different web browser:
    http://www.apple.com/quicktime/download/
    If you use Firefox instead of IE for the download (or vice versa), do you get a working installer?

  • Reading an object from a binary file

    i am writing objects into my binary file using printwriter class. i am able to write objects into the file but i am having problems reading the object from the file. is there any other way of going about it. i tried using the objectoutputstream and object input stream class. but i am getting run time errors coz of something to do with serialization
    i am storing records as a object into a binary file so that it is easy to seek my records

    Of course you have trouble reading objects after you wrote them with a PrintWriter.
    You should rather have fixed the Serialization errors: only objkects that implement Serializable correctly can be serialized.

  • DE PDP-1 binary file from Java

    Can someone here help me, please!?
    Anyone know how to convert a Java class file into a binary file that will run natively on my Digital PDP-1 computer? I just spent over $120,000 for it! Thanks.
    This resurrected thread was first posted November 25, 1960 at 8:25AM
    -------------------------------------------------------------------------

    This resurrected thread was first posted November 25,
    1960 at 8:25AMIn what time zone?

  • How can I open different binary files from BLOB column ?

    If we store some type of binary file (XLS, DOC, PDF, EML and so on, not only pictures) in BLOB column how can I show the different contents? We use designer and forms 9i with PL/SQL.
    How can I copy the files from BLOB to file in a directory or how can I pass BLOB's content to the proper application directly to open it?

    The mime type is just a string as explained above (e.g. application/pdf...). There are lot of samples here and on metalink.
    E.g. add a column mime_type varchar(30) to your blob table. Create a procedure similar to the following:
    PROCEDURE getblob
    (P_FILE IN VARCHAR2
    IS
    vblob blob;
    vmime_type myblobs.mime_type%type;
    length number;
    begin
         select document, mime_type into vblob,vmime_type from myblobs where docname = p_file;
         length := dbms_lob.getlength(vblob);
         if length = 0 or vblob is null then
         htp.p('Document not available yet.');
         else
         owa_util.mime_header(vmime_type);
         htp.p('Content-Length: ' || dbms_lob.getlength(vblob));
         owa_util.http_header_close;
         wpg_docload.download_file(vblob);                
         end if;
    exception
         when others then
         htp.p(sqlerrm);
    END;
    Create a DAD on your application server (refer to documentation on how to create a DAD).
    Display the blob from forms (e.g. on a when-button-pressed trigger):
    web.show_document('http://myserver:port/DAD/getblob?p_file=myfilename','_blank');
    For storing blobs in a directory on your db server take a look at the dbms_lob package.
    For storing blobs in a directory on your app server take a look at WebUtil available on OTN.
    HTH
    Gerald Krieger

  • How do you open multiple binary files and plot them all on the same graph?

    I have three different binary files and I want to plot all 3 of them onto a graph.
    I am familiar with opening and reading a single binary file. (Thanks to the help examples!) 
    But to do multiple numbers at the same time?  I was thinking of putting 3 different 'reading from binary file' blocks with the rest of the 'prompts', 'data type', etc.. and then connecting them on the same wire.
    However, I got into a mess already when I tried to read one .bin file to dynamic data type --> spectral measurements --> graph waveform.  The error was Not enough memory to complete this operation...  Why is that?  That didnt happen in the help example "Read Binary File.vi"...  Has it got something to do with the dynamic data type?
    Thank you for your time.
    Jud~

    Have a look at the image below and attached VI.  Simply enter the different paths into the PathArray control.
    R
    Message Edited by JoeLabView on 07-30-2008 09:59 PM
    Attachments:
    multipleBinary2Graph.vi ‏18 KB
    multipleBinary.PNG ‏5 KB

  • How can I convert the binary file content to XML message

    Dear friends,
    I poll the binary file from a ftp server but the payload only includes the binary content, no XML structure in the payload, I hope to convert the binary content to a element node within the XML structure, how can I do that? via content conversion?
    Thanks and regards,
    Bean

    Read the binary file stream using java I/O standard functions and convert the read stream to Base64 format. Now map this content to one of the field in target XML structure.
    You need a java mapping for this.
    what is your target system?
    Thanks,
    Gujjeti.
    Hi Gujjeti,
    Thanks a lot for your kind help, my target system is R/3.
    Can I achieve that with a UDF or a simple way?
    Regards,
    Bean

  • Deleting a single element from a binary file

    I am working on a server application that must keep track of the messages that have been sent but not responded to.  After I send a message, I append it to an array cycling through an uninitialized shift register and I write it to the end of a binary file.  When I receive a response to a message, which was probably but not necessarily the first message sent, I delete that message from the array and the file.  This allows me to look up entries quickly but also to maintain a permanent record of what messages have been sent and not responded to.
    Basically, I need an intelligent way to make a file, or any other permanent storage medium, act effectively like a queue.  The problem with the current implementation is that when I delete one variable-sized entry from the file, I need to move all subsequent entries, which are usually all of the other entries in the file, forward in memory to take its place.  Is there a way to get around recopying the majority of the file or to implement this entire thing more intelligently?

    You can organize your datafile as a list. For example In this list any record consists of a fixed size data field and referencies (file position f.e.) to the next record and  to the previous record. You can then rewrite the record-have-to-be-deleted with new one, or you can "delete" this record modifiyng the referencies of the "previous" and the "next" records. When you will add a new record you can write one at this "free" place.
    The other way is to manage two files. The first one contains your data record by record. The second one contains the numbers of the records and the corresponding datafile positions or record counter. This second file can be very small and easy-to-manage. It can represent the only record's queue position in the datafile. You can rewrite your records, mark them as deleted without moving large portions of data.
    You can setup the datafile capacity in "number of records" term.
    Additionally it is possible to use the records with variable length but it will be much more dificult. 
    Best regards, Evgeny. 

Maybe you are looking for