How to Efficiently Sample a Fixed Number of Rows

Good afternoon. I need to select a specific number of random rows from a table, and while I believe my logic is right it's taking too long, 30 minutes for a routine data size. Hopefully someone here can show me a more efficient query. I've seen the SAMPLE function, but it just randomly selects rows on a one-by-one basis, without a guaranteed total count.
This is the idea:
INSERT INTO Tmp_Table (Value, Sequence) SELECT Value FROM Perm_Table, DBMS_RANDOM.VALUE;
SELECT Value FROM Tmp_Table WHERE ROWNUM <= 1234 ORDER BY Sequence;I'd need to put the ORDER BY in a subselect for ROWNUM to work correctly, but anyway that's just an illustration. My actual need is a little more complicated. I have many sets of data; each set has many rows; and for each set I need to return a specific (different) number of rows. Perhaps project A has three rows in this table, and I want to keep two of them; project B has two rows, and I want to keep one of them. So I need to identify, for each row, whether it's valid for that project. This is what my data looks like:
Project Person Sequence Position Keeper
A       Bill    1234     1        Yes
A       Fred    5678     3        No
A       George 1927     2        Yes
B       April   5784     2        No
B       Janice 2691     1        YesI populate Sequence with random values, then calculate the position of each person within their project, and finally discard people who's Position is greater than Max_Targets for the Project. Fred and April have the highest random numbers, so they're cut. It's not the case that I'm just trimming one person from each project; the actual percentage kept will range from zero to 100.
Populating the list with random values is not time-consuming, but calculating Position is. This is my code:
UPDATE Tmp_Targets T1
SET Position =
SELECT
   COUNT(*)
FROM
   Perm_Targets PT1
   INNER JOIN Perm_Targets PT2 ON PT1.Project = PT2.Project
   INNER JOIN Tmp_Targets T2 ON PT2.Target = T2.Target
WHERE
       T1.Target = PT1.Target
   AND T2.Sequence <= T1.Sequence
);The Target fields are PKs, and the Project and Sequence fields are indexed. Is there a better way to approach this? I could write a cursor that pulls out project codes and performs the above operations for each project in turn; that would be logically simpler and possibly faster. Has anyone here addressed a similar problem before? I'd appreciate any ideas.
This is on 9.2, in case it matters. Thank you,
Jonathan

You've not given any indication of how max targets for a given project is determined, so for my example I'm using the ceiling of 1/2 of the number of records in each project which gives the same number of yes and no responses per project as you had:
with dta as (
select 'A' project, 'Bill' person from dual union all
select 'A', 'Fred' from dual union all
select 'A', 'George' from dual union all
select 'B', 'April' from dual union all
select 'B', 'Janice' from dual
), t1 as (
select project
       , person
       , row_number() over (partition by project order by dbms_random.value) ord
       , count(*) over (partition by project) cnt
       , rownum rn
    from dta
select project
     , person
     , ord
     , cnt
     , case when ord <= ceil(cnt/2) then 'Yes' else 'No' end keep
from t1
order by rn
PROJECT PERSON ORD                    CNT                    KEEP
A       Bill   2                      3                      Yes
A       Fred   3                      3                      No
A       George 1                      3                      Yes
B       April 1                      2                      Yes
B       Janice 2                      2                      No
5 rows selectedIn this example I use an analytic function to assign a random ordering for each record within a project in the middle query, in the final output query I am determining the yes no status based on the order within a project and the count of records in the project. If you had a table of projects indicating the thresh hold you could join to that and use the thresh hold in place of the ceil(cnt/2) portion of my inequality in the case statement.

Similar Messages

How can I get a fixed number of rows on a SELECT?

I'm interested on get the last xx records of one specific query that returns an higher number of rows than xx. How can I do this?

you can use "where rownum < xx" if you
write "select * from (select ... from ... order by ...) where rownum < 10"
but you can't write "rownum > 10" because no record will be returned.
therefor you have to do like this:
select * from (select ..., rownum as nummer from (select ... from ... order by ...)) where nummer > 10
~
pascal

How to display a fixed number of rows in a page when using CL_GUI_ALV_GRID

Hy experts
How to display a fixed number of rows in a page when using CL_GUI_ALV_GRID?? lets say 500 ?? because my display table it may contain in some cases 10.000 and evidently I can t see all of them..
I have a button in my toolbar witch triggers this event
(display 500 records ) but I don t have the logic to do this only with methods of CL_GUI_ALV_GRID.
can you tell me a standard method of CL_GUI_ALV_GRID witch can help me do this?? any hint will be good..
Till now I was used to add a column to my structure witch represents a flag that is a number corresponding to every 500 records (a batch containing 500 records )
first       500 - flag -> 1
second 500 - flag -> 2
etc..but I m convinced that exists a way of doing this more easy..without damaging my structure..
thanx in advance..don t be shy..reply if you have any hints..

Hi,
if method SET_FILTER_CRITERIA doesn´t help, I think that you must work with 2 internal tables, a counter and a loop for filtering the records to be displayed:
case counter.
when 1.
     loop at int_table1 from 1 to 500.      "<-- your table with all records
       move int_table1 to int_table2
    endloop.
when 2.
     loop at int_table1 from 501 to 1000.
       move int_table1 to int_table2
    endloop.
etc, etc.
Call grid-->SET_TABLE_FOR_FIRST_DISPLAY
   exporting
     IT_OUTTAB = int_table2                "<-- instead of your currently table int_table1

How to show a fixed number of rows in JTable

Hi,
I have to show only a fixed number of rows in the table .
After scrolling number of rows must not be changed.

I don't understand the question.
The number of visible rows is dependent on the size of the scroll pane.
Scrolling does not change the number of rows that are visible.

Fixed number of rows in ADF table

Can we specify fixed number of rows for ADF table so even no rows displayed it will still show 10 empty rows.

the rangesize property determines how many rows will be displayed and controls paging. If there are no rows to begin with (if I understand the question correctly) rangesize won't have any effect.

Create spreadsheet file with a fixed number of rows

What is the most straight forward way to create a series of spreadsheet files each with a new file name and fixed number of rows. We have a data acquisition process that creates a new 1D array every 2 seconds. We'd like to build a series of spreadsheet files each having two hours or 3600 rows of data. Is there a best way to do this in LV9?
john

Use the low-level FileI/O Vis with Write to Text File.vi where you open a new file with every N iteration like this:
You just have to convert your 1D Array to string before.
If you would like to have a new file every N hours you should create a FGV which checks the elapsed time using Get Date/Time in Seconds.vi, which is more appropriate for longtime applications.
Christian

How to display fixed number of rows on rtf file

Hi everyone,
I have the following requirement. The pdf output should have fixed number of 33 lines in a group by. The rows in group by can vary for each group . For eg: If a group has 3 rows , the pdf output should print 3 lines and 30 empty lines, and if group has 2 lines, the out put should print 31 emty lines.
can anyone please help me with this requirement?
Thanks
Sunny

Take a look at this blog post: http://blogs.oracle.com/xmlpublisher/entry/anatomy_of_a_template_i_fixed
Thanks,
Bipuser

Fixed number of rows

hi,
all.
i wnt to dispaly only fixed no of row in output but there mayee to more record in internal table .how can i do that
suppose i want to dispaly only 10 record in output but there maye be more that 10 record in inetrnal table,we will
not fixe in selection query,how we will do that.

Hi,
Use a counter and move the fixed number of records to another internal table and display it.
loop at itab.
count = count + 1.
move-corresponding itab to itab1.
append itab1.
if count > 10.
exit.
endif.
endloop.
Now use itab1 to display the output.
Regards,
Vikranth

How do I increase the total number of rows?

This seems like a dumb question, but I can only seem to be able to add one new row at the bottom of my spreadsheet at a time using the '=' in a circle button that appears below the last row of the spreadsheet.
I want to add hundreds of rows, and I can only seem to add one at a time which seems ridiculous and a huge waste of time. There has to be a way to add a huge block of empty rows to the spreadsheet right? Can you really only add one new blank row at a time?

you can select rows, then use the key combination <option> + <down arrow> to add the same number of rows you selected for each key press.
If you select five rows then each time you type the key combination Numbers will add five rows.
If I want to add many rows I select all rows, then type the key combination, then I select all rows, and type the key combination again, repeat as needed

How to determine count for the number of rows

Appreciate if any of you could think of a way of determining the count for the number of rows in the subquery without having to run another query.
SELECT *FROM
(SELECT rownum, rn, rlp_id, rlp_notes, cad_pid, status, jurisdiction_id, s.state_abbr, rlp_address, rlp_route_id, rlp_route_section, psma_version FROM ipod.relevant_land_parcels r, state s WHERE s.state_pid = r.state_pid(+) AND rlp_route_id = 'SM1' AND status = 'CURRENT')WHERE rn > 200 AND rn < 216
And I want to import this into.net and C# environment.

Something like this,.....????
SQL> select * from emp;
EMPNO ENAME      JOB         MGR HIREDATE          SAL      COMM DEPTNO
7369 SMITH      CLERK      7902 17/12/1980     800,00               20
7499 ALLEN      SALESMAN   7698 20/02/1981    1600,00    300,00     30
7521 WARD       SALESMAN   7698 22/02/1981    1250,00    500,00     30
7566 JONES      MANAGER    7839 02/04/1981    2975,00               20
7654 MARTIN     SALESMAN   7698 28/09/1981    1250,00   1400,00     30
7698 BLAKE      MANAGER    7839 01/05/1981    2850,00               30
7782 CLARK      MANAGER    7839 09/06/1981    2450,00               10
7788 SCOTT      ANALYST    7566 19/04/1987    3000,00               20
7839 KING       PRESIDENT       17/11/1981    5000,00               10
7844 TURNER     SALESMAN   7698 08/09/1981    1500,00      0,00     30
7876 ADAMS      CLERK      7788 23/05/1987    1100,00               20
7900 JAMES      CLERK      7698 03/12/1981     950,00               30
7902 FORD       ANALYST    7566 03/12/1981    3000,00               20
7934 MILLER     CLERK      7782 23/01/1982    1300,00               10
14 rows selected
SQL>
SQL> select max(rw) from
2 (
3 select empno , row_number () over (order by empno) rw from emp
4 where job='CLERK'
5 )
6 /
   MAX(RW)
         4Greetings...
Sim

3 queries, how to make it fixed number of rows

Hi. I have the 3 queries with the following structure. Q1 customer, Q2 sales order, Q3 lines in the sales orders
Q1
Q2
Q3
Q3
Q2
Q1
I followed the one suggested in thread: Can you limit returned rows in a loop?
regarding value for each text form field as follows
text1 = <xsl:variable name="lpp" select="number(5)"/>
text2 = <?for-each@section:LIST?> <xsl:variable xdofo:ctx="incontext" name="group" select=".//LINES"/>
<?for-each:$group?><?if:(position()-1) mod $lpp=0?><xsl:variable name="start" xdofo:ctx="incontext" select="position()"/>
text3 = <?for-each:$group?><?if:position()>=$start and position()<$start+$lpp?>
text4 = <?LINE?>
text5 = <?end if?><?end for-each?>
text6 = <?sum($group[(position()>=$start) and (position()<($start+$lpp))]/LINE)?>
text7 = <?if:not(count($group) mod $lpp=0) and ($start+$lpp>count($group))?>
text8 = <?end if?><?end for-each?><?end if?>
text9 = <?if:count($group)<$start+$lpp?>
text10 = <?end if?>
text11 = <xsl:if xdofo:ctx="inblock" test="$start+$lpp<=count($group)"><xsl:attribute name="break-before">page</xsl:attribute></xsl:if>
text12 = <?end if?><?end for-each?><?end for-each?>
and it worked for me if i have only 2 queries. but now when im using it in 3 queries. how to apply the said format in involving 3 queries?
Thanks.

Hi,
if method SET_FILTER_CRITERIA doesn´t help, I think that you must work with 2 internal tables, a counter and a loop for filtering the records to be displayed:
case counter.
when 1.
     loop at int_table1 from 1 to 500.      "<-- your table with all records
       move int_table1 to int_table2
    endloop.
when 2.
     loop at int_table1 from 501 to 1000.
       move int_table1 to int_table2
    endloop.
etc, etc.
Call grid-->SET_TABLE_FOR_FIRST_DISPLAY
   exporting
     IT_OUTTAB = int_table2                "<-- instead of your currently table int_table1

How do I find out the number of rows in a resultset ?

Without scrolling it.

And explain to me, oh sarcastic one, how it wouldn't be a better idea to put that data into a datastructure and pass that back.1. Performance with very large ResultSet objects for which only sparse data is desired (that's what pages are for).
2. Resource optimization.
3. Efficient sequential processing.
4. The ability to intelligently determine based on resource usage whether to page the results or use your implementation for small ResultSet objects.
Answering your points:
1. I just showed him a very easy way to get the size.
2. I don't see any advantage of a collection (if that's what you were envisioning) over a paged ResultSet. Actually the ResultSet is a collection of sorts.
3. Your data is reliant on the connection initially anyway. The collection would only provide an advantage in the possibly short window while the data was being viewed.
4. You can always resort a ResultSet by requerying.

How to find out the current number of rows in a form without navigation

Hi.
Is there any way to count the rows in a form (block) without navigation to the last record?
I am modifying CUSTOM.pll and have to count the rows before user commits changes.
All records are new in this case. Can anyone halp me? Thanks.
Regards
Tomáš

Magoo wrote:
no, such a block-property ^unfortunately^ does not exists.
you can just go to to the block, call the last_record build-in and find out, where the cursor is.
But with this you call restrictred procedures and their are not everywhere allowed ...
If you execute a query on a block, may not all records getting retrieved from database.
For this forms does not know, how many records are really in the block and
for this there is no build-in like get_block_property ( records_count ).It does exist indeed. GET_BLOCK_PROPERTY('BLOCK',QUERY_HITS);
Of course, this will return the number of records that would get fetched to the block (based on the where condition), but not the records with NEW status (i.e new records which are not yet committed).
-Arun

Getting a fixed number of rows from various items

Hello,
I am new to Oracle and I am having a hard time resolving this problem. I will try to explain as best as possible.
The application requires that I build a list of items that have been shipped. As they may have various shipping companies, there may be various tracking numbers for each item shipped. The application requires that I get a list of the items that have been shipped that meet certain criteria, then for each item, list the first three tracking numbers that are found. If the shipped item has more than three tracking numbers, the others beyond the third are ignored. I built a SELECT statement using the rownum "column" to limit my result set to a maximum of three. My intention was to build a list of
item ids, then pass those to a stored procedure, then loop for each item id, getting the first three tracking numbers. The tracking numbers returned would be appended to a cursor, which would return the entire set of tracking numbers for the list of item ids.
Any recommendations as to how to approach this requirement would be greatly appreciated. I have gotten through the SELECT that returns the list of item ids. I am having trouble finding the correct approach to getting the three tracking numbers for each item id. One solution was to have the application cycle through each
item id, but I am hoping that this can be done within the stored procedure so that it is called only once for the returned list of item ids.
I wish to thank you in advance. This is all the information I can provide, as I don't have any DDL statements. I hope it will suffice.
Salvador
null

Please see if you can use something like this:
SELECT a.item, a.tracking_number
FROM table_name a
WHERE 3 >=
(SELECT COUNT (*) + 1
FROM table_name b
WHERE a.item = b.item
AND b.ROWID < a.ROWID
AND b.criteria = 'whatever')
AND a.criteria = 'whatever'
ORDER BY a.item, a.tracking_number
For example, if you have a table named items_shipped that has columns named item, tracking_num, and shipped, and you want all the items that meet the criteria that shipped = 'Y' and the first three tracking_nums for those items, then your query would look something like this:
SELECT a.item, a.tracking_num
FROM items_shipped a
WHERE 3 >=
(SELECT COUNT (*) + 1
FROM items_shipped b
WHERE a.item = b.item
AND b.ROWID < a.ROWID
AND b.shipped = 'Y')
AND a.shipped = 'Y'
ORDER BY a.item, a.tracking_num
You have given very limited information. If the above suggestions are not sufficient, please provide your table structure, including table names, column names, data types and lengths, some sample data, what criteria must be met, and a sample of what you would like the output to be.
null

How do I process a large number of rows using ADO?

I need to iterate through a table with about 12 million records and process each row individually. The project I'm doing cannot be resolved with a simple UPDATE statement.
My concern is that when I perform the initial query that so much data will be returned that I'll simply crash.
Ideally I would get to a row, perform my operation, then go to the next row ... and so on and so on.
I am using ADO / C++

I suggest you simply use the default fast-forward read-only (firehose) cursor to read the data. This will stream data from SQL Server to your application and client memory usage will be limited to the internal API buffers without resorting to paging.
I ran a quick test of this technique using ADO classic and the C# code below and it ran in under 3 minutes (35 seconds without the file processing) on my Surface Pro against a remote SQL Server. I would expect C++ to be a significantly faster
since it won't incur the COM interop penalty. The same test with SqlClient ran in under 10 seconds.
static void test()
var sw = Stopwatch.StartNew();
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss.fff"));
object recordCount;
var adoConnection = new ADODB.Connection();
adoConnection.Open(@"Provider=SQLNCLI11.1;Server=serverName;Database=MarketData;Integrated Security=SSPI");
var outfile = new StreamWriter(@"C:\temp\MarketData.txt");
var adoRs = adoConnection.Execute("SELECT TOP(1200000) Symbol, TradeTimestamp, HighPrice, LowPrice, OpenPrice, ClosePrice, Volume FROM dbo.OneMinuteQuote;", out recordCount);
while(!adoRs.EOF)
outfile.WriteLine("{0},{1},{2},{3},{4},{5},",
(string)adoRs.Fields[0].Value.ToString(),
((DateTime)adoRs.Fields[1].Value).ToString(),
((Decimal)adoRs.Fields[2].Value).ToString(),
((Decimal)adoRs.Fields[3].Value).ToString(),
((Decimal)adoRs.Fields[4].Value).ToString(),
((Decimal)adoRs.Fields[5].Value).ToString());
adoRs.MoveNext();
adoRs.Close();
adoConnection.Close();
outfile.Close();
sw.Stop();
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss.fff"));
Console.WriteLine(DateTime.Now.ToString(sw.Elapsed.ToString()));
Dan Guzman, SQL Server MVP, http://www.dbdelta.com

How to Efficiently Sample a Fixed Number of Rows

Similar Messages

Maybe you are looking for