Calculating hash values for really big files

I am using the following code to calculate the hash values of files
public static String hash(File f, String algorithm)
            throws IOException, NoSuchAlgorithmException {
        if (!f.isFile()) {
            throw new IOException("Not a file");
        RandomAccessFile raf = new RandomAccessFile(f, "r");
        byte b[] = new byte[(int) raf.length()];
        raf.readFully(b);
        raf.close();
        MessageDigest messageDigest = MessageDigest.getInstance(algorithm);
        messageDigest.update(b);
        return toHexString(messageDigest.digest());
    }Now the problem is, for really big files, 100 MB or over, I get an OutOfMemoryError.
I have used the -Xms and -Xms options to increase the JVM heap size, and untimately made it to work. However, I think this is lame and there is also a limit to the -Xmx option.
Is there any other way I can calculate the hash values of these really big files?
Thanks a lot in advance.

why do u open the file the way u do ?
why to u upload ALL the file AT ONCE into the memory ?
i would do it like this:
FileInputStream fis = new FileInputStream (f);
int fileSize = f.available();
byte buffer[] = new byte[1000];
MessageDigest messageDigest = MessageDigest.getInstance(algorithm);
for(int read = 0;read < fileSize;read +=1000;)
if(fis.available() > 1000)
fis.read(buffer, read, 1000);
else if(fis.available() > 0)
fis.read(buffer, read, fis.available());
else
break;
messageDigest.update(b);
fis.close();
return toHexString(messageDigest.digest());

Similar Messages

Generating MD5 hash value for any specific flat file

Hi experts,
I am developing a program that will generate flat files and also I should generate the MD5 Hash value for each and every flat files. My question is how can I generate the MD5 hash value for the generated .txt files.
Thanks in advance
Shabir

You can use functions
MD5_CALCULATE_HASH_FOR_CHAR for text file
MD5_CALCULATE_HASH_FOR_RAW for binary

Create hash value for clob column ?

Hi,
does anybody know a way to calculate a hash value for a clob column (9i) ?
DBMS_UTILITY.GET_HASH_VALUE could only handle varchar2(4000).
Thank you!
Regrads
Michael
Message was edited by:
mseiwert

I can't reproduce it on my 10.2.0.4.0. CTL file:
load data
INFILE *
Replace into table samp
fields terminated by ","
trailing nullcols
no,
col1 Char(100000000) ,
col2 Char(100000000) enclosed by '"' and '"'
BEGINDATA
1,asdf,"assasadsdsdsd""sfasdfadf""sdsdsa,ssfsf"
2,sfjass,"dksadk,kd,ss""dfdfjkdjfdk""sasfjaslaljs"Loading:
SQL> Create table samp
2 (
3 no number,
4 col1 clob,
5 col2 clob
6 );
Table created.
SQL> host sqlldr scott/tiger control=c:\temp\samp.ctl log=c:\temp\samp.log
SQL> select * from samp
2 /
        NO
COL1
COL2
         1
asdf
assasadsdsdsd"sfasdfadf"sdsdsa,ssfsf
         2
sfjass
dksadk,kd,ss"dfdfjkdjfdk"sasfjaslaljs
        NO
COL1
COL2
SQL> SY.

Changing MAXSIZE value for a data file.

Hello, I was wondering if it is possible to change the MAXSIZE value for a data file from UNLIMITED to a specified value, like 4M.
Thank you,
David

This is trhe command that actually did what I wanted:
ALTER DATABASE DATAFILE 'D:\DBDATA\TESTPTIX02.ORA' AUTOEXTEND ON MAXSIZE 3840M;
Thank you all for tour time and help. Have a safe and happ Christmas and new years.
David

Related to calculation of value from two source files

Hi, we have two files based on volume and costs
Time,Item,site,ASM,Retail are dimensions..
The volume one is:
May-09     item 1     Site 1     ASM 1     Retail     VOL     100
May-09     item 2     Site 1     ASM 1     Retail     VOL     150
May-09     item 3     Site 1     ASM 1     Retail     VOL     130
May-09     item 4     Site 1     ASM 1     Retail     VOL     120
May-09     item 4     Site 1     ASM 2     Retail     VOL     150
May-09     item 4     Site 2     ASM 3     Retail     VOL     100
The Cost one is:
May-09     item 1     Site 1     1.2
May-09     item 2     Site 1     1.3
May-09     item 3     Site 1     1.1
May-09     item 4     Site 2     1.3
May-09     item 4     Site 1     1.5
note that in the second file site,ASM are missing (this was the problem for us from source file)
Here in essbase we need to calculate VALUE = VOL * COST with respect to item and site,such that in selection critera
the value must be represented with respect to ASM and Retail dimension also.
Psl post the approach and the sollution how to load this and how to calculate..

Hi,
You would have to transform the second file to include the ASM and Retail dimension members. It is essential that you have one to one relationship between Site, ASM and Retail. You can maintain the mapping in a separate file and then read the cost file and get values for ASM and Retail from mapping file.
Otherwise in the load rule, you will have to specify one single member from each dimension - ASM, Retail in the header definition.
Once you have the data in the system, you can run a calculation to calculate the value of VALUE. Or else, you can also define a formula on VALUE.
Let me know if it helps.
Cheers
RS

Suggestion needed for processing Big Files in Oracle B2B

Hi,
We are doing a feasibility study for Using Oracle AS Integration B2B over TIBCO. We are presently using TIBCO for our B2B transactions. Now since my client company planning to Implement Fusion Middleware (Oracle ESB and Oracle BPEL), we are also looking at Oracle AS Integration B2B for B2B transactions (On other words we are planning to replace TIBCO by Oracle Integration B2B if possible).
I am really concern about one thing that is receiving and processing any "BIG FILE" (15 MB of size) from trading partner.
Present Scenario: One of our trading partner is sending Invoice documents in a single file and that file size can grow upto 15 MB of size. In our existing scenario when we receive such big files from trading partner (through TIBCO Business Connect - BC), Tibco BC works fine for 1 or 2 files but it crashes once it received multiple files of such size. What exactly happening is Whatever Memory that TIBCO BC is consuming to receive one such big file, are not getting released after processing and as a result TIBCO BC throws "OUT OF MEMORY" error after processing some files.
My questions:
     1. How robust the Oracle AS Integration B2B is, in terms of processing such big files?
     2. Is there any upper limit in terms of size that Oracle AS Integration B2B can handle for receiving and processing data?
     3. What is the average time required to receive and process such big file? (Lets say we are talking about 15MB of size).
     4. Is there any documentation availble that talks about any such big files through Oracle B2B?
Please let me know if you need more information.
Thanks in advance.
Regards,
--Kaushik

Hi Ramesh,
Thanks for your comment. We will try to do POC ASAP. I will definitely keep in touch with you during this.
Thanks bunch.
Regards,
--Kaushik

Calculating HASH values with WCCP

Ok, I'm just not getting the HASH calculations. Can somebody please explain how the HASH values translate into subnets?
Thanks,
Patrick

Patrick,
I'm not a 100% sure of the algorithm used to determine what subnet is assigned to which WCCP bucket. However, I do know it involves an XOR of various L3 and L4 header fields in the packet.
To view the how the calculation has been performed you can run the hidden IOS command
show ip wccp hash <dst-ip> <src-ip> <dst-port> <src-port>
Router# show ip wccp 61 hash 0.0.0.0 10.88.81.10 0 0
WCCP hash information for:
    Primary Hash:   Src IP: 10.88.81.10
Bucket:   9
    WCCP Client: 10.88.81.12
Router#
Hope this helps,
Mike Korenbaum
Cisco WAAS PDI Help Desk
http://www.cisco.com/go/pdihelpdesk

Plan hash value for two queries!

Hi,
DB : Oracle 11g (11.2.0.3.0)
OS: RHEL 5
I have two question:
1. Can two queries have same plan hash value? I mean I have below two queries:
SELECT /+ NO_MERGE */ MIN(payor.next_review_date)*
* FROM payor*
* WHERE payor.review_complete = 0*
* AND payor.closing_date IS NULL*
* AND payor.patient_key = 10;*
and
SELECT MIN(payor.next_review_date)
* FROM payor*
* WHERE payor.review_complete = 0*
* AND payor.closing_date IS NULL*
* AND payor.patient_key = 10;*
When I tried to review the execution plan for both queries, the plan hash value remain same. Does it mean that execution plan for both queries are same? If yes, then how Oracle understands or changes the execution plan based on hint. If no then what plan hash value represents?
2. If the execution plan with hint and without hint is same except for a given query except no.of rows and bytes. Does it mean that query with less rows and bytes scanned is better?
Thanks in advance
-Onkar

Hi,
there are two different things. One is EXPLAIN PLAN, which is how the optimizer thinks the query will be executed. It contains some estimates of cost, cardinalities etc. There is also EXECUTION PLAN. It also contains all this information regarding the optimizer estimates, but on the top of that, it also contains information about actual I/O incurred, actual cardinalities, actual timings etc.
So if a hint is changing optimizer estimates, but the plan stays the same, then impact on query's performance is zero.
If the actual numbers are changing, this is probably also irrelevant to the hint (e.g. you can have less physical reads because more blocks are found in the buffer cache the second time you're running the query, or you less work because you don't have to parse the statement etc.).
Actually, most of optimizer hints don't affect optimizer estimates; rather, they try to get the optimizer to use a certain access method or a certain join order etc. regardless of the cost. So you must be talking about such hints as cardinality, dynamic_sampling etc. If that's not the case -- please clarify, because this means that something wrong is going on here (e.g. an INDEX hint may work or it may fail to work, but if it fails, optimizer estimates shouldn't change).
Best regards,
Nikolay

Looking for sha-1 hash values for the Windows 10 Enterprise x64 & x86 ISO's

When I downloaded the ISO's, I either missed the hash values or they weren't there. Still can't fine them. I'd sure like to validate the ISO's before trying to install.
Can anyone point me to their locations or post them here?
Mahalo

Maybe aboodi86 just ran a sha1 generator on the downloads.
For what it's worth, I got the exact same value listed above for my x64 download, so it's legit. I didn't download the x86 version.
-Noel
Detailed how-to in my eBooks:
Configure The Windows 7 "To Work" Options
Configure The Windows 8 "To Work" Options

Calculating Accumulative Value for a particular period

Hi,
I want to calculate acumulative values based on 0calmonth for a key figure.
In Rows, i want 0calmonth and a key figure in Columns. If we select the property of the key figure as "Cumulative". It is adding values like in first month, first month value, in 2nd month it is showing the values by adding 1st and 2nd months. But I gave a Interval variable on 0calmonth (e.g 03.2006 to 09.2006 ) it displaying cumulative values from 3rd month. In 4th month it is showing value for 3rd and 4th month. But i want to see the values as "Accumulative" means from starting of that year. Eventhougth i gave the period value as 03.2006 to 09.2006, it has to display the value of 3rd month as adding of 1st,2nd and 3rd months. like that it has to show up to last month in the given peroid.
Please can any one suggest me....
Thanks and Regards
Rajesh
Message was edited by:
        rajesh
Message was edited by:
        rajesh

Hi ,
For my Above Problem I am using the code as follows. But it has no errors. but when it is displaying on the web browser. it is not getting values.
DATA: L_S_RANGE1 TYPE RSR_S_RANGESID.
      DATA: LOC_VAR_RANGE1 LIKE RRRANGEEXIT.
      DATA: L_VALUE LIKE RRRANGEEXIT-HIGH.
CASE I_VNAM.
    WHEN 'ZCUM_INTERVAL'.
      IF I_STEP = 2.
      LOOP AT i_t_var_range INTO LOC_VAR_RANGE1 WHERE VNAM = '0I_CMNTH'.
         L_VALUE = LOC_VAR_RANGE1-LOW.
         while L_VALUE4(2) < LOC_VAR_RANGE1-HIGH4(2).
            if sy-index > 1.
              L_VALUE4(2) = L_VALUE4(2) + 1.
              if strlen( L_value+4(2) ) = 1.
                 concatenate '0' L_VALUE4(2) into L_VALUE4(2).
              endif.
            endif.
            CLEAR L_S_RANGE1.
            L_S_RANGE1-LOW = LOC_VAR_RANGE1-LOW(4).
            L_S_RANGE1-LOW+4(2) = '01'.
            L_S_RANGE1-HIGH = L_VALUE.
           L_S_RANGE1-SIGN = 'I'.
           L_S_RANGE1-OPT = 'BT'.
            APPEND L_S_RANGE1 TO E_T_RANGE.
          ENDwhile.
        ENDLOOP.
      ENDIF.
Please can any one suggest me regarding this.
Thanks in Advance...
TR
Rajesh

Ora_hash - Same hash value for different inputs (Urgent)

Hi,
Trying to use ora_hash to join between tables but i noticed that in some cases, when working on different input value the ora_hash function generates same results.
select ora_hash('oomeroe03|6NU3|LS006P|7884|1|17-JUL-13 13.18.22.528000|0005043|'),ora_hash('GSAHFFXTK|GCQ3|A6253S|12765|1|17-JUL-13 17.26.26.853000|0136423|')
from dual
Output value : 1387341941
Oracle version is 11gR2.
Thanks

Why would anyone limit the hash distribution to three buckets ?
However, one must understand that the default seed is 0. So one input repeated gets the same hash value unless the seed is changed.
SQL> select ora_hash(rn) , rn
2 from
3 (Select rownum as rn from dual connect by level < 11)
4 order by rn;
ORA_HASH(RN)         RN
2342552567          1
2064090006          2
2706503459          3
3217185531          4
   365452098          5
1021760792          6
   738226831          7
3510633070          8
1706589901          9
1237562873         10
10 rows selected.
SQL> l
1 select ora_hash(rn) , rn
2 from
3 (Select rownum as rn from dual connect by level < 11)
4* order by rn
SQL> /
ORA_HASH(RN)         RN
2342552567          1
2064090006          2
2706503459          3
3217185531          4
   365452098          5
1021760792          6
   738226831          7
3510633070          8
1706589901          9
1237562873         10
10 rows selected.
SQL> /
ORA_HASH(RN)         RN
2342552567          1
2064090006          2
2706503459          3
3217185531          4
   365452098          5
1021760792          6
   738226831          7
3510633070          8
1706589901          9
1237562873         10
10 rows selected.
SQL>
Hemant K Chitale

Tips for downloading big files(World Of Warcraft)

I need to download this 8 gigs of the game more. I'm downloading directly from bilzzard its going about 150 kb/s. Just wonder some tips yall do when downloading big files like this. Also if my macbook pro goes to sleep will it slow down the download because thats what seems to happen when I leave it on overnight.

Do it in the evening through overnight. Set the MBP to not sleep. System Preferences>Energy Saver>Computer Sleep = Never.

Hash value in my DME file...

Hi all,
I am not sure if this issue I am suppose to issue an OSS to SAP or there is an existing OSS notes...
Recently, we had an upgrade from 4.6c to ECC6, and of course unicode conversion were carried off. After the upgrade, my DME (RFFOM100 program) is generating '#' value in all my DME file. This cause the bank to reject my file.
1. I'd search the service marketplace on this issue... but I can't find any OSS notes to this.
2. I'd tried debugging the program... and the coding is SAP standard that put '#' into all character space of my file.
3. I am not sure if its UNICODE that is causing this problem...
Should i raise an OSS to SAP? or there is an OSS notes that I am not aware of?
Thanks,
William Wilstroth

I am closing this thread because this issue is just way too back last year... oh dear... I can't remember if I ever use Nil's OSS to solve it... geeze...
but thanks Nils for your advice.

Help needed in calculating hash code for a linked list??

I have reffered API documentation for the list interface...There i found the code below for the hashcode() method ...I couldn't get why "31" is used as a multiplicative factor?
       int hashCode = 1;
       Iterator<E> i = list.iterator();
       while (i.hasNext()) {
           E obj = i.next();
           hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
       }I'm a beginner....please help me out..

Because it's a prime number, I think. You'll probably want to find an article or decent book on creating optimal hash functions.

Time Capsule with ext HDD seems too slow for ethernet big file transfer

With new Snow Leopard on a Mac Mini, I have a wired Time Capsule with an external WD USB 2.0 HDD added. While moving files over 802.11N files seem to work fine, a full wired Ethernet connection seems to send the files too fast for the WD external drive to keep up. I get only the first 50Mb before the transfer gets corrupted. Isn't there a handshake that limits the file transmission to allow for the USB drive to stay ahead?
Has anyone else seen this? Is there a remedy?
Thanks
Bill H.

With new Snow Leopard on a Mac Mini, I have a wired Time Capsule with an external WD USB 2.0 HDD added. While moving files over 802.11N files seem to work fine, a full wired Ethernet connection seems to send the files too fast for the WD external drive to keep up. I get only the first 50Mb before the transfer gets corrupted. Isn't there a handshake that limits the file transmission to allow for the USB drive to stay ahead?
Has anyone else seen this? Is there a remedy?
Thanks
Bill H.

Calculating hash values for really big files

Similar Messages

Maybe you are looking for