Problem with large text-files, HOWTO?

Hi!
I'm making an application witch shall search through a dir with 3000 html-files, and find all links in those files.
I have a text files with the format:
file1: linktofile:linktofile6:linktofile5
file2: linktofile1:linktofile87:
and so on.
This file shall then be searched when I'm pressing hyperlinks in IExplorer. The problem is that this file is VERY long both "horizontally and vertical". Is there a clever way to shorten it?

If you have to search the entire contents of all 3000 files every time, then I don't see how that could be shortened. But if you have to search those files only for instances of "linktofile1295", for example, then you could redesign your text files into a database where you could access those instances directly via an index.

Similar Messages

Problems with Large XML files

I have tried increasing the memory pool using the -mx and -ms options. It doesnt work. I am using your latest XML parser for Java v2. Please let me know if there are some specific options I should be using.
Thanx,
-Sameer
We have a number of test files that are that size and it works without a problem. However using the DOMParser does require significantly more memory than your doc size.
What is the memory configuration of the JVM that you are running with? Have you tried increasing it? Are you using our latest version 2.0.2.6?
Oracle XML Team
Is there a restriction on the XML file size that can be loaded into the parser?
I am getting a out of memory exception reading in large XML file(10MB) using the commands
DOMParser parser = new DOMParser();
URL url = createURL(argv[0]);
parser.setErrorStream(System.err);
parser.setValidationMode(true);
parser.showWarnings(true);
parser.parse(url);
Win NT 4.0 Server
Sun JDK 1.2.2
===================================
Error output
===================================
Exception in thread "main" java.lang.OutOfMemoryError
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at java.util.Hashtable.<init>(Unknown Source)
at oracle.xml.parser.v2.DTDDecl.<init>(DTDDecl.java, Compiled Code)
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at oracle.xml.parser.v2.ValidatingParser.checkDefaultAttributes(Validati
ngParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseAttributes(NonValidatin
gParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingPa
rser.java, Compiled Code)
at oracle.xml.parser.v2.ValidatingParser.parseRootElement(ValidatingPars
er.java:97)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingP
arser.java:199)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:146)
at TestLF.main(TestLF.java:40)
null

You might try using a different JDK/JRE - either a 1.1.6+ or 1.3 version as 1.2 in our experience has the largest footprint. If this doesn't work can you give us some details about your system configuration. Finally you might try the SAX interface as it does not need to load the entire DOM tree into memory.
Oracle XML Team

Problem with Date - Text File Source and Oracle Target

Hi All,
I have a source data (text file) with date column in the format of 'MM/DD/YYYY'. My target is oracle. I am using the LKM FILE TO SQL and IKM SQL Control Append. When i execute this interface, i am getting an error as
7000 : null : com.sunopsis.jdbc.driver.file.b.i
com.sunopsis.jdbc.driver.file.b.i
at com.sunopsis.jdbc.driver.file.b.f.getColumnClassName(f.java)
at the Load Data step.
How to load date columns from text file to oracle tables? Please help me in resolving this...
Thanks in Advance,
Ram Mohan T

The worst solution is to define for your text file the date as a string and then to rebuild your date type in Oracle.
something like
convert(Substr(myfield,4,2)||Substr(myfield,1,2)||Substr(myfield,7,4) ,'MMDDYYYY')
this is maybe the worst solution but it should work.
Regards
Brice

Problems with large Photoshop files CC 2014

Hi,
Having a strange issue with the file size of some psd's.
I have two basicly identical images. They are both the same size (same ppi och same mesurments in cm). Both of them are singel layer and there are no paths or other hidden stuff. Both are in sRGB as well. The issue here is that the file on the left is about 12 MB and the one on the right is almost twice that and I can't for the life of me figure out why? As far as I'm concerned the image on left looks a bit more "advanced" and should be the bigger one.
This isn't just this image but most of the images i've been working on for the last couple of weeks. Chose this one becuse it was easy to compare to an older image around the same size.
Also recived some images from Samsung a couple a days ago wich where the same size as these ones but where 135 MB!!! After flattening the image to one layer it went down to about 25-27 MB but that still feels a alot to me.
Has there been any changes to the way Photoshop handles the file size in the latest verison (CC 2014)?
And befor anyone asks, yes they are saved for maximised compability.
Any idea's? Sorry for the bad english by the way

No, there are no changes to the handling of large files, or files in general.
But mistakes in bit depth, cropping, layers, etc. could explain a file size difference for files that look sort of the same.

Problems with large scanned images

I have been giving Aperture another try since 1.1 came out, and I am still having problems with large tiff files derived from scanned 4x5 negatives. The files are 500mb or more, 16 bit RGB, with ProPhoto RGB or Ektaspace PS5 profiles, directly out of the scanner.
Aperture imports the files correctly, and shows their thumbnails. When I select a thumbnail "Loading" is displayed briefly, and the the dreaded "Unsupported Image Format" is displayed. Sometimes "Loading" goes on for a while, and a geometric pattern (looking like a rendering of random memory) is displayed. Restarting Aperture doesn't help.
Lower resolution (250mb, 16bit) files are handled properly. The scans are from an Epson 4870 scanner. I have tried pulling the scans into Photoshop and resaving with various tiff options, and as PSD with no improvement. I have the same problem with corrected/modified psd files coming out of Photoshop CS2.
I am running on a Power Mac G5 dual 2ghz with 8gb of RAM and an NVIDIA GeForce 6800 GT DDL (250mb) video card, with all the latest OS and software updates.
Has anyone else had similar problems? More importantly, is anyone else able to work with 500mb files of any kind? Is it my system, or is it the software? I sent feedback to Apple as well.
dual g5 2ghz Mac OS X (10.4.6)

I have a few (well actually about 100) scans on my system of >500Mb. I tried loading a few and am getting an inconsistent pattern of errors that correlates with what you are reporting.
I imported 4 files and three were troubled, the fouth was OK. I imported another four files and the first one was OK and the three others had your reported error, also the previously good file from the first import was now showing the same 'unsupported' image' message.
I would venture to say that if you shoot primarily 4x5 and work with scans of this size that Aperture is not the program for you--right now. I shoot 35mm and have a few images that I have scanned at 8000dpi on my Imacon 848 but most of my files are in the more reasonable 250Mb range (35mm @ 5000dpi).
I will probably downsample my 8000dpi scans to 5000dpi and not worry to much about it. In a world where people believe that 16 megapixels is hi-res you are obviously on the extreme side.(Good for you!) You should definately file a bug report but I wouldn't expect much help anytime soon for your super-sized scans.

A problem with copying text from english pdf to a word file

i have a problem with copying text from english pdf to a word file. the english text of pdf turns to be unknown signs when i copy them to word file .
i illustrated what i mean in the picture i attached . note that i have adobe acrobat reader 9 . so please help cause i need to copy text to translate it .

Is this an e-book? Does it allow for copying? It is possible that the pdf file is a scan of a book?

Is anyone else having problems with large files, such as installation images, becoming corrupted when downloaded in Mavericks using Safari?

I am finding that when I try to download a disk image file, such as Office 2011, it reads invalid checksum, is corrputed and will not install. I tried to download it several times and even unchecked in the Disk Utility Preference Pane to verify checksums. The way I solved the problem, was to go to my Windows Machine and download the image, put it on a USB flash drive and install it on my Mac from the stick. Not a proper solution, but it did work. Has anyone out there had this problem? This is applying to any large .dmg files not just Office 2011.

I'm having exactly the same problem with the dmg file for Office 2011, tried downloading from my windows machine with no luck, still having the invalid checksum message.

Exit labview (executables) after using large text files

Hello,
I am using LabView 6.0 and his aplication builder / runtime engine. I wrote some VI`s to convert large Tab delimited textfiles (up to 50 mb). When I am finished with the file it is staying in the memory somehow and is staggered with other (text)files in such a way the computer is slowing down.
When I want to exit the VI (program) it will take a very long time to get lost of the program (resetting LabView) and get my speed back.
How kan I solve this problem for these large files?
Martin.

OK, this may be a bit of a problem to track down, but let's start.
First, while your front panel looks great, your code is very hard to read. Overlapping elements, multiple nested structures and a liberal use of locals make this a problem. My first suggestion would be to with a massive cleanup operation. Make more room, make wires straight, make sure things are going left-to-right, make subVIs, place some documentation and so on. You won't believe the difference this makes.
After you did that, we can turn to find the problems. Some likely suspects are the local variables and the array functions. You use many local variables and perform resizing operations which are certain to generate copies. If you do this on arrays with dozens of MBs of data, this looks like the most likely source of the problem. Some suggestions to deal with this - if you have repeating code, make subVIs or move the code outside of the structures, so that it only has to be run once. Also, you seem to have some redundant code. For instance, you open the file only to see if you get an error. You should be able to do this with the VIs in the advanced palette without opening it (and you won't need to close it, either). Another example - you check the exit conditions in many places in your code. If your loop runs fast enough, there is no need for it. Some more suggestions - use shift registers instead of locals and avoid setting the same properties over and over again in the loop.
After you do these, it will probably be much easier to find the problem.
To learn more about LabVIEW, I suggest you try searching this site and google for LabVIEW tutorials. Here and here are a couple you can start with. You can also contact your local NI office and join one of their courses.
In addition, I suggest you read the LabVIEW style guide and the LabVIEW user manual (Help>>Search the LabVIEW Bookshelf).
And one last thing - having the VI run automatically and then use the Quit VI at the end is not very nice. Since you are building it, it will run automatically on its own and you can use the Application>>Kind property to quit only if it's an executable.
Try to take over the world!

Arbitrary waveform generation from large text file

Hello,
I'm trying to use a PXI 6733 card hooked up to a BNC 2110 in a PXI 1031-DC chassis to output arbitrary waveforms at a sample rate of 100kS/s. The types of waveforms I want to generate are generally going to be sine waves of frequencies less than 10 kHz, but they need to be very high quality signals, hence the high sample rate. Eventually, we would like to go up to as high as 200 kS/s, but for right now we just want to get it to work at the lower rate.
Someone in the department has already created for me large text files > 1GB with (9) columns of numbers representing the output voltages for the channels(there will be 6 channels outputting sine waves, 3 other channels with a periodic DC voltage. The reason for the large file is that we want a continuous signal for around 30 minutes to allow for equipment testing and configuration while the signals are being generated.
I'm supposed to use this file to generate the output voltages on the 6733 card, but I keep getting numerous errors and I've been unable to get something that works. The code, as written, currently generates an error code 200290 immediately after the buffered data is output from the card. Nothing ever seems to get enqued or dequed, and although I've read the Labview help on buffers, I'm still very confused about their operation so I'm not even sure if the buffer is working properly. I was hoping some of you could look at my code, and give me some suggestions(or sample code too!) for the best way to achieve this goal.
Thanks a lot,
Chris(new Labview user)

Chris:
For context, I've pasted in the "explain error" output from LabVIEW to refer to while we work on this. More after the code...
Error -200290 occurred at an unidentified location
Possible reason(s):
The generation has stopped to prevent the regeneration of old samples. Your application was unable to write samples to the background buffer fast enough to prevent old samples from being regenerated.
To avoid this error, you can do any of the following:
1. Increase the size of the background buffer by configuring the buffer.
2. Increase the number of samples you write each time you invoke a write operation.
3. Write samples more often.
4. Reduce the sample rate.
5. Change the data transfer mechanism from interrupts to DMA if your device supports DMA.
6. Reduce the number of applications your computer is executing concurrently.
In addition, if you do not need to write every sample that is generated, you can configure the regeneration mode to allow regeneration, and then use the Position and Offset attributes to write the desired samples.
By default, the analog output on the device does what is called regeneration. Basically, if we're outputting a repeating waveform, we can simply fill the buffer once and the DAQ device will reuse the samples, reducing load on the system. What appears to be happening is that the VI can't read samples out from the file fast enough to keep up with the DAQ card. The DAQ card is set to NOT allow regeneration, so once it empties the buffer, it stops the task since there aren't any new samples available yet.
If we go through the options, we have a few things we can try:
1. Increase background buffer size.
I don't think this is the best option. Our issue is with filling the buffer, and this requires more advanced configuration.
2. Increase the number of samples written.
This may be a better option. If we increase how many samples we commit to the buffer, we can increase the minimum time between writes in the consumer loop.
3. Write samples more often.
This probably isn't as feasible. If anything, you should probably have a short "Wait" function in the consumer loop where the DAQmx write is occurring, just to regulate loop timing and give the CPU some breathing space.
4. Reduce the sample rate.
Definitely not a feasible option for your application, so we'll just skip that one.
5. Use DMA instead of interrupts.
I'm 99.99999999% sure you're already using DMA, so we'll skip this one also.
6. Reduce the number of concurrent apps on the PC.
This is to make sure that the CPU time required to maintain good loop rates isn't being taken by, say, an antivirus scanner or something. Generally, if you don't have anything major running other than LabVIEW, you should be fine.
I think our best bet is to increase the "Samples to Write" quantity (to increase the minimum loop period), and possibly to delay the DAQmx Start Task and consumer loop until the producer loop has had a chance to build the queue up a little. That should reduce the chance that the DAQmx task will empty the system buffer and ensure that we can prime the queue with a large quantity of samples. The consumer loop will wait for elements to become available in the queue, so I have a feeling that the file read may be what is slowing the program down. Once the queue empties, we'll see the DAQmx error surface again. The only real solution is to load the file to memory farther ahead of time.
Hope that helps!
Caleb Harris
National Instruments | Mechanical Engineer | http://www.ni.com/support

Problems with importing text messages from PC Suit...

Problems with importing text messages from PC Suit 7.1.18.0 to my Nokia 5800
I am trying to import a csv file that contains text messages (Note that this file was created using PC Suit 7.1.18.0) to a subfolder that I have created to My Folders but PC Suits only imports the text messages to the Draft folder. Note that initially it shows that the messages are import in the correct folder but after a refresh it shows them in the Draft Folder. Is their any setting that I should change in the PC Suit or the phone? My computer runs on Windows XP Service Pack 3 and the Nokia 5800 was upgraded to the latest firmware v20.0.012
Thanks for your help

Most phones only allows importing of draft and archived box for SMS.
To do a restoring, you need to backup the SMS as a .nbu file using PC Suite and restore later.
If you got an SD card, you can also do a backup on the SD Card (backup.arc) then restore later (reset and restore: backup.arc and mmc).
What's the law of the jungle?

Out.println() problems with large amount of data in jsp page

I have this kind of code in my jsp page:
out.clearBuffer();
out.println(myText); // size of myText is about 300 kbThe problem is that I manage to print the whole text only sometimes. Very often happens such that the receiving page gets only the first 40 kb and then the printing stops.
I have made such tests that I split the myText to smaller parts and out.print() them one by one:
Vector texts = splitTextToSmallerParts(myText);
for(int i = 0; i < texts.size(); i++) {
out.print(text.get(i));
out.flush();
}This produces the same kind of result. Sometimes all parts are printed but mostly only the first parts.
I have tried to increase the buffer size but neither that makes the printing reliable. Also I have tried with autoFlush="false" so that I flush before the buffer size gets overflowed; again same result, sometimes works sometimes don't.
Originally I use such a system where Visual Basic in Excel calls a jsp page. However, I don't think that this matters since the same problems occur if I use a browser.
If anyone knows something about problems with large jsp pages, I would appreciate that.

Well, there are many ways you could do this, but it depends on what you are looking for.
For instance, generating an Excel Spreadsheet could be quite easy:
import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
public class TableTest extends HttpServlet{
 public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
 response.setContentType("application/xls");
 PrintWriter out = new PrintWriter(response.getOutputStream());
 out.println("Col1\tCol2\tCol3\tCol4");
 out.println("1\t2\t3\t4");
 out.println("3\t1\t5\t7");
 out.println("2\t9\t3\t3");
 out.flush();
 out.close();
}Just try this simple code, it works just fine... I used the same approach to generate a report of 30000 rows and 40 cols (more or less 5MB), so it should do the job for you.
Regards

Problem with selecting text in Adobe Reader XI

Hi all, I am encountering a problem with Adobe Reader XI and it is very strange I could not find an alike issue on the internet so I guess I have to submit a question with it.
So here it is, I am using Adobe Reader XI Version 11.0.2, operating system: Windows 7. I do not know it starts from when but it has the problem with selecting text - to be copied in pdf documents. I ensure that the documents are not scanned documents but word-based documents (or whatever you call it, sorry I cannot think of a proper name for it).
Normally, you will select the text/paragraph you want to copy and the text/paragraph will be highlighted, but the problem in this case that I cannot select the text/paragraph, the blinking pointer (| <-- blinking pointer) will just stays at the same location so I cannot select/highlight anything to be copied. It happens oftenly, not all the time but 90%.
This is very annoying as my work involves very much with copying text from pdf documents, I have to close the pdf file and then open it again so I can select the text but then after the first copying or second (if I am lucky), the problem happens again. For a few text I have to type it myself, for a paragraph I have to close all opening pdf documents and open again so I could select the paragraph to copy. I ran out of my patience for this, it causes trouble and extra time for me just to copying those texts from pdf documents. Does this problem happen to anyone and do you have a solution for this? I would much appreciate if you could help me out, thank you!

Yeah, I totally agree, this is very strange. I have always been using Adobe Reader but this problem only occurred ~three months ago. It must be that some software newly installed I think. But I have no idea.
About your additional question, after selecting the texts and Ctrl + C, the texts are copied and nothing strange happens. It's just that right after I managed to copy the texts, it takes me a while to be able to select the texts again. For your information, I just tested to select the texts and then pressed Ctrl, the problem happened. And then I tried pressing C and then others letters, it all led to the problem.
I guess I have to stick with left-clicked + Copy until I/someone figure the source of this problem. Thanks a lot for your help!

ITunes 8.2 has problems handling larger music files

Hi,
After upgrading to iTunes 8.2, I have begun to notice that the program is halted with larger mp3 files. I have quite a lot of music files with size 200MB and above and before opening them, iTunes stops for about 10-15 seconds (with "beach ball" on the screen), then the playback starts normally. This is not an actual problem, but pretty annoying though. iTunes 8.2 behaves in the same way both on my black MacBook, as well as on my oldie G4 iMac running Tiger. Never noticed such a behaviour with earlier 8.1 version of iTunes.
Has anyone encountered similar problem and could there be any remedy? Thanks in advance.
Cheers,
Matt

iso_omena,
I can't duplicate your problem here, so maybe there's something about your machine or the software installed on it. Do you have any iTunes plugins installed?
You can also take a sample of iTunes while it's hung and send it to our good buddy, Roy, who will forward it to the right Apple engineers. To sample iTunes:
1. Open the terminal app in /Applications/Utilities
2. Get iTunes into the state where you have the spinning cursor and the music's not playing.
3. At the command prompt in the terminal window, type in "sample iTunes 10". This will sample iTunes for 10 seconds and then tell you where it put the output file.
4. You need to send that file to <[email protected]>. Make sure that you include:
(1) the sample file.
(2) the link to this thread.
(3) a one line description of your problem, i.e. "iTunes crashes on launch".
(4) the username that you're using here in the discussion boards.
Roy is getting a little swamped with messages and needs to make sure he gets all the information. And if he doesn't get that information, the message is just going to get dropped on the floor.

Editing and changing large text file

hi,
new to this, so bare with me.
got a large text file 44meg and i need to change some values in it.
example:
TSX ;20030102;40302216;40300579;1980;1900;3762000
i need to change the lines so that they read:
TSX ;20030102;302216;300579;1980;1900;3762000
thus removing the leading 40 in the middle cols.
Thanks in advance
john

crap, small mistake
1) use BufferedReader to read in the file line by line (BufferedReader.readLine())
2a) for each line, split it on the semicolons (String.split())
2b) change the middle value using String.substring()
2c) construct a new line by appending all strings in the array returned by 2a) to eachother
2d) write this new line to a file using PrintStream (PrintSteam.println())
3) when done, close both the reader and the printstream.

I am facing problem to get text file from application sever

Hi This is lokesh.
Actually my requirement is to craete sales orders by getting file from other server.for that i have used shell script to connect that server.Ok i am connecting to that different server(not sap server) successfully.
But my problem is getting text file from application SERVER.
That different server people will send the file name as 'DDHHMMSS' (Days,Hours,Minutes,Seconds).
Just suppose if they will send today means that file name will come as 20103025. and with in half an hour if they again send that file file means that file name as '20110025'.
like that file name is always varies.So how can i read that type of text files from the application server.
and one more thing that is there chance to read morethan one text file from application server.
Pls guide me if u know the solution.PLs this requirement is urgent.
Regards,
Lokeshgoud

Hi..,
Just execute this program ... this is for the files received in JUST 3 MINS...
change it according to your requirement !!
data time type sy-uzeit value '100000'. "<<----start time
data file(8) type c value '20'.
do 180 times. " <<----- for 3 minutes
add sy-index to time.
concatenate file time into file.
write / file.
OPEN DATASET FILE FOR INPUT IN TEXT MODE ENCODING DEFAULT.
IF SY-SUBRC EQ 0.
PERFORM READ_DATASET.
ENDIF.
file = '20'.
time = '100000'.
ENDDO.
form read_dataset.
read the file here with the file name FILE.
endform.
regards,
sai ramesh

Problem with large text-files, HOWTO?

Similar Messages

Maybe you are looking for