I/O Write Performance
Hello ,
we are currently experiencing heavy I/O problmes perfoming prrof of concept
testig for one of our customers. Our setup is as follows:
HP ProLiant DL380 with 24GB Ram and 8 15k 72GB SAS drives
An HP P400 Raid controller with 256MB cache in RAID0 mode was used.
Win 2k8r2 was installed on c (a physical Drive) and the database on E
(= two physical drives in RAID0 128k Strip Size)
With the remaining 5 drives read and write tests were performed using raid 0 with variing number of drives.
I/O performance, as measured with ATTO Disk benchmark, increased as expected linear with the number of drives used.
We expected to see this increased performance in the database, too and performed the following tests:
- with 3 different tables the full table scan (FTS) (Hint: /*+ FULL (s) NOCACHE (s) */)
- a CTAS statement.
The system was used exclusively for testing.
The used tables:
Table 1: 312 col, 12,248 MB, 11,138,561 rows, avg len 621 bytes
Table 2: 159 col, 4288 MB, 5,441,171 rows, avg len 529 bytes
Table 3: 118 col, 360MB, 820,259 rows, avg len 266 bytes
The FTS has improved as expected. With 5 physical drives in a RAID0, a performance of
420MB/s was achieved.
In the write test on the other hand we were not able to archieve any improvement.
The CTAS statement always works with about 5000 - 6000 BLOCK/s (80MB/s)
But when we tried running several CTAS statements in different sessions, the overall speed increased as expected.
Further tests showed that the write speed seems to depend also on the number of columns. 80MB/s were only
possible with Tables 2 and 3. With Table 1, however only 30MB/s were measured.
Is this maybe just an incorrectly set parameter?
What we already tried:
- change the number of db_writer_processes 4 and then to 8
- Manual configuration of PGA and SGA size
- setting DB_BLOCK_SIZE to 16k
- FILESYSTEMIO_OPTIONS set to setall
- checking that Resource Manager are really disabled
Thanks for any help.
V$PARAMETERS
1 lock_name_space
2 processes 150
3 sessions 248
4 timed_statistics TRUE
5 timed_os_statistics 0
6 resource_limit FALSE
7 license_max_sessions 0
8 license_sessions_warning 0
9 cpu_count 8
10 instance_groups
11 event
12 sga_max_size 14495514624
13 use_large_pages TRUE
14 pre_page_sga FALSE
15 shared_memory_address 0
16 hi_shared_memory_address 0
17 use_indirect_data_buffers FALSE
18 lock_sga FALSE
19 processor_group_name
20 shared_pool_size 0
21 large_pool_size 0
22 java_pool_size 0
23 streams_pool_size 0
24 shared_pool_reserved_size 93952409
25 java_soft_sessionspace_limit 0
26 java_max_sessionspace_size 0
27 spfile C:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\SPFILEORATEST.ORA
28 instance_type RDBMS
29 nls_language AMERICAN
30 nls_territory AMERICA
31 nls_sort
32 nls_date_language
33 nls_date_format
34 nls_currency
35 nls_numeric_characters
36 nls_iso_currency
37 nls_calendar
38 nls_time_format
39 nls_timestamp_format
40 nls_time_tz_format
41 nls_timestamp_tz_format
42 nls_dual_currency
43 nls_comp BINARY
44 nls_length_semantics BYTE
45 nls_nchar_conv_excp FALSE
46 fileio_network_adapters
47 filesystemio_options
48 clonedb FALSE
49 disk_asynch_io TRUE
50 tape_asynch_io TRUE
51 dbwr_io_slaves 0
52 backup_tape_io_slaves FALSE
53 resource_manager_cpu_allocation 8
54 resource_manager_plan
55 cluster_interconnects
56 file_mapping FALSE
57 gcs_server_processes 0
58 active_instance_count
59 sga_target 14495514624
60 memory_target 0
61 memory_max_target 0
62 control_files E:\ORACLE\ORADATA\ORATEST\CONTROL01.CTL, C:\ORACLE\FAST_RECOVERY_AREA\ORATEST\CONTROL02.CTL
63 db_file_name_convert
64 log_file_name_convert
65 control_file_record_keep_time 7
66 db_block_buffers 0
67 db_block_checksum TYPICAL
68 db_ultra_safe OFF
69 db_block_size 8192
70 db_cache_size 0
71 db_2k_cache_size 0
72 db_4k_cache_size 0
73 db_8k_cache_size 0
74 db_16k_cache_size 0
75 db_32k_cache_size 0
76 db_keep_cache_size 0
77 db_recycle_cache_size 0
78 db_writer_processes 1
79 buffer_pool_keep
80 buffer_pool_recycle
81 db_flash_cache_file
82 db_flash_cache_size 0
83 db_cache_advice ON
84 compatible 11.2.0.0.0
85 log_archive_dest_1
86 log_archive_dest_2
87 log_archive_dest_3
88 log_archive_dest_4
89 log_archive_dest_5
90 log_archive_dest_6
91 log_archive_dest_7
92 log_archive_dest_8
93 log_archive_dest_9
94 log_archive_dest_10
95 log_archive_dest_11
96 log_archive_dest_12
97 log_archive_dest_13
98 log_archive_dest_14
99 log_archive_dest_15
100 log_archive_dest_16
101 log_archive_dest_17
102 log_archive_dest_18
103 log_archive_dest_19
104 log_archive_dest_20
105 log_archive_dest_21
106 log_archive_dest_22
107 log_archive_dest_23
108 log_archive_dest_24
109 log_archive_dest_25
110 log_archive_dest_26
111 log_archive_dest_27
112 log_archive_dest_28
113 log_archive_dest_29
114 log_archive_dest_30
115 log_archive_dest_31
116 log_archive_dest_state_1 enable
117 log_archive_dest_state_2 enable
118 log_archive_dest_state_3 enable
119 log_archive_dest_state_4 enable
120 log_archive_dest_state_5 enable
121 log_archive_dest_state_6 enable
122 log_archive_dest_state_7 enable
123 log_archive_dest_state_8 enable
124 log_archive_dest_state_9 enable
125 log_archive_dest_state_10 enable
126 log_archive_dest_state_11 enable
127 log_archive_dest_state_12 enable
128 log_archive_dest_state_13 enable
129 log_archive_dest_state_14 enable
130 log_archive_dest_state_15 enable
131 log_archive_dest_state_16 enable
132 log_archive_dest_state_17 enable
133 log_archive_dest_state_18 enable
134 log_archive_dest_state_19 enable
135 log_archive_dest_state_20 enable
136 log_archive_dest_state_21 enable
137 log_archive_dest_state_22 enable
138 log_archive_dest_state_23 enable
139 log_archive_dest_state_24 enable
140 log_archive_dest_state_25 enable
141 log_archive_dest_state_26 enable
142 log_archive_dest_state_27 enable
143 log_archive_dest_state_28 enable
144 log_archive_dest_state_29 enable
145 log_archive_dest_state_30 enable
146 log_archive_dest_state_31 enable
147 log_archive_start FALSE
148 log_archive_dest
149 log_archive_duplex_dest
150 log_archive_min_succeed_dest 1
151 standby_archive_dest %ORACLE_HOME%\RDBMS
152 fal_client
153 fal_server
154 log_archive_trace 0
155 log_archive_config
156 log_archive_local_first TRUE
157 log_archive_format ARC%S_%R.%T
158 redo_transport_user
159 log_archive_max_processes 4
160 log_buffer 32546816
161 log_checkpoint_interval 0
162 log_checkpoint_timeout 1800
163 archive_lag_target 0
164 db_files 200
165 db_file_multiblock_read_count 128
166 read_only_open_delayed FALSE
167 cluster_database FALSE
168 parallel_server FALSE
169 parallel_server_instances 1
170 cluster_database_instances 1
171 db_create_file_dest
172 db_create_online_log_dest_1
173 db_create_online_log_dest_2
174 db_create_online_log_dest_3
175 db_create_online_log_dest_4
176 db_create_online_log_dest_5
177 db_recovery_file_dest c:\oracle\fast_recovery_area
178 db_recovery_file_dest_size 4322230272
179 standby_file_management MANUAL
180 db_unrecoverable_scn_tracking TRUE
181 thread 0
182 fast_start_io_target 0
183 fast_start_mttr_target 0
184 log_checkpoints_to_alert FALSE
185 db_lost_write_protect NONE
186 recovery_parallelism 0
187 db_flashback_retention_target 1440
188 dml_locks 1088
189 replication_dependency_tracking TRUE
190 transactions 272
191 transactions_per_rollback_segment 5
192 rollback_segments
193 undo_management AUTO
194 undo_tablespace UNDOTBS1
195 undo_retention 900
196 fast_start_parallel_rollback LOW
197 resumable_timeout 0
198 instance_number 0
199 db_block_checking FALSE
200 recyclebin on
201 db_securefile PERMITTED
202 create_stored_outlines
203 serial_reuse disable
204 ldap_directory_access NONE
205 ldap_directory_sysauth no
206 os_roles FALSE
207 rdbms_server_dn
208 max_enabled_roles 150
209 remote_os_authent FALSE
210 remote_os_roles FALSE
211 sec_case_sensitive_logon TRUE
212 O7_DICTIONARY_ACCESSIBILITY FALSE
213 remote_login_passwordfile EXCLUSIVE
214 license_max_users 0
215 audit_sys_operations FALSE
216 global_context_pool_size
217 db_domain
218 global_names FALSE
219 distributed_lock_timeout 60
220 commit_point_strength 1
221 global_txn_processes 1
222 instance_name oratest
223 service_names ORATEST
224 dispatchers (PROTOCOL=TCP) (SERVICE=ORATESTXDB)
225 shared_servers 1
226 max_shared_servers
227 max_dispatchers
228 circuits
229 shared_server_sessions
230 local_listener
231 remote_listener
232 listener_networks
233 cursor_space_for_time FALSE
234 session_cached_cursors 50
235 remote_dependencies_mode TIMESTAMP
236 utl_file_dir
237 smtp_out_server
238 plsql_v2_compatibility FALSE
239 plsql_warnings DISABLE:ALL
240 plsql_code_type INTERPRETED
241 plsql_debug FALSE
242 plsql_optimize_level 2
243 plsql_ccflags
244 plscope_settings identifiers:none
245 permit_92_wrap_format TRUE
246 java_jit_enabled TRUE
247 job_queue_processes 1000
248 parallel_min_percent 0
249 create_bitmap_area_size 8388608
250 bitmap_merge_area_size 1048576
251 cursor_sharing EXACT
252 result_cache_mode MANUAL
253 parallel_min_servers 0
254 parallel_max_servers 135
255 parallel_instance_group
256 parallel_execution_message_size 16384
257 hash_area_size 131072
258 result_cache_max_size 72482816
259 result_cache_max_result 5
260 result_cache_remote_expiration 0
261 audit_file_dest C:\ORACLE\ADMIN\ORATEST\ADUMP
262 shadow_core_dump none
263 background_core_dump partial
264 background_dump_dest c:\oracle\diag\rdbms\oratest\oratest\trace
265 user_dump_dest c:\oracle\diag\rdbms\oratest\oratest\trace
266 core_dump_dest c:\oracle\diag\rdbms\oratest\oratest\cdump
267 object_cache_optimal_size 102400
268 object_cache_max_size_percent 10
269 session_max_open_files 10
270 open_links 4
271 open_links_per_instance 4
272 commit_write
273 commit_wait
274 commit_logging
275 optimizer_features_enable 11.2.0.3
276 fixed_date
277 audit_trail DB
278 sort_area_size 65536
279 sort_area_retained_size 0
280 cell_offload_processing TRUE
281 cell_offload_decryption TRUE
282 cell_offload_parameters
283 cell_offload_compaction ADAPTIVE
284 cell_offload_plan_display AUTO
285 db_name ORATEST
286 db_unique_name ORATEST
287 open_cursors 300
288 ifile
289 sql_trace FALSE
290 os_authent_prefix OPS$
291 optimizer_mode ALL_ROWS
292 sql92_security FALSE
293 blank_trimming FALSE
294 star_transformation_enabled TRUE
295 parallel_degree_policy MANUAL
296 parallel_adaptive_multi_user TRUE
297 parallel_threads_per_cpu 2
298 parallel_automatic_tuning FALSE
299 parallel_io_cap_enabled FALSE
300 optimizer_index_cost_adj 100
301 optimizer_index_caching 0
302 query_rewrite_enabled TRUE
303 query_rewrite_integrity enforced
304 pga_aggregate_target 4831838208
305 workarea_size_policy AUTO
306 optimizer_dynamic_sampling 2
307 statistics_level TYPICAL
308 cursor_bind_capture_destination memory+disk
309 skip_unusable_indexes TRUE
310 optimizer_secure_view_merging TRUE
311 ddl_lock_timeout 0
312 deferred_segment_creation TRUE
313 optimizer_use_pending_statistics FALSE
314 optimizer_capture_sql_plan_baselines FALSE
315 optimizer_use_sql_plan_baselines TRUE
316 parallel_min_time_threshold AUTO
317 parallel_degree_limit CPU
318 parallel_force_local FALSE
319 optimizer_use_invisible_indexes FALSE
320 dst_upgrade_insert_conv TRUE
321 parallel_servers_target 128
322 sec_protocol_error_trace_action TRACE
323 sec_protocol_error_further_action CONTINUE
324 sec_max_failed_login_attempts 10
325 sec_return_server_release_banner FALSE
326 enable_ddl_logging FALSE
327 client_result_cache_size 0
328 client_result_cache_lag 3000
329 aq_tm_processes 1
330 hs_autoregister TRUE
331 xml_db_events enable
332 dg_broker_start FALSE
333 dg_broker_config_file1 C:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\DR1ORATEST.DAT
334 dg_broker_config_file2 C:\ORACLE\PRODUCT\11.2.0\DBHOME_1\DATABASE\DR2ORATEST.DAT
335 olap_page_pool_size 0
336 asm_diskstring
337 asm_preferred_read_failure_groups
338 asm_diskgroups
339 asm_power_limit 1
340 control_management_pack_access DIAGNOSTIC+TUNING
341 awr_snapshot_time_offset 0
342 sqltune_category DEFAULT
343 diagnostic_dest C:\ORACLE
344 tracefile_identifier
345 max_dump_file_size unlimited
346 trace_enabled TRUE
961262 wrote:
The used tables:
Table 1: 312 col, 12,248 MB, 11,138,561 rows, avg len 621 bytes
Table 2: 159 col, 4288 MB, 5,441,171 rows, avg len 529 bytes
Table 3: 118 col, 360MB, 820,259 rows, avg len 266 bytes
The FTS has improved as expected. With 5 physical drives in a RAID0, a performance of
420MB/s was achieved.
In the write test on the other hand we were not able to archieve any improvement.
The CTAS statement always works with about 5000 - 6000 BLOCK/s (80MB/s)
But when we tried running several CTAS statements in different sessions, the overall speed increased as expected.
Further tests showed that the write speed seems to depend also on the number of columns. 80MB/s were only
possible with Tables 2 and 3. With Table 1, however only 30MB/s were measured.
If multiple CTAS can produce higher throughput on writes this tells you that it is the production of the data that is the limit, not the writing. Notice in your example that nearly 75% of the time of the CTAS as CPU, not I/O.
The thing about number of columns is that table 1 has exceeded the critical 254 limit - this means Oracle has chained all the rows internally into two pieces; this introduces lots of extra CPU-intensive operations (consistent gets, table access by rowid, heap block compress) so that the CPU time could have gone up significantly, resulting in a lower throughput that you are interpreting as a write problem.
One other thought - if you are currently doing CTAS by "create as select from {real SAP table}" there may be other side effects that you're not going to see. I would do "create test clone of real SAP table", then "create as select from clone" to try and eliminate any such anomalies.
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
Author: <b><em>Oracle Core</em></b>
Similar Messages
-
Write performance in Directory Server 5.0
Hi,
is it possible to generate around 350 updates / second with IDS 5.0 ?
I haven't chosen any Hardware yet, because I can't find anything
on how to size a Directory Server according to write performance.
Has someone experience with write performance and how it scales
using more CPU / RAM ?
Thanks,
Sascha
Sascha Hemmerling eMail:
[email protected]
Dweerkamp 13
24247 Mielkendorf Tele: +49-4347-713258Were you trying to create a new index and then reintdex the Database....if so Did you check the free space of your database filesystem??because it mentions about space problem for the database..after reindexin
-
In which case we require to write perform using/changing
hi,
in which case we require to write perform using/changing .
and what is excatly we r doing with perform using.changing.
please somebody help me.
thanks
subhasisThis is an awfully basic question.
Simply press F1 on PERFORM.
And responders take note:
bapi
Rob -
NFS write performance 6 times slower than read
Hi all,
I built my self a new homeserver and want to use NFS to export stuff to the clients. Problem is that I get a big difference in writing and reading from/to the share. Everything is connected by GBit Network, and raw network speed is fine.
Reading on the clients yields about 31MByte/s which is almost the native speed of the disks (which are luks-encrypted). But writing to the share gives only about 5.1MByte/s in the best case. Writing to the disks internally gives about 30MByte/s too. Also writing with unencrypted rsync from the client to the server gives about 25-30MByte/s, so it is definitely not a network or disk problem. So I wonder if there is anything that I could do to improve the Write-Performance of my NFS-shares. Here is my config which gives the best results so far:
Server-Side:
/etc/exports
/mnt/data 192.168.0.0/24(rw,async,no_subtree_check,crossmnt,fsid=0)
/mnt/udata 192.168.0.0/24(rw,async,no_subtree_check,crossmnt,fsid=1)
/etc/conf.d/nfs-server.conf
NFSD_OPTS=""
NFSD_COUNT="32"
PROCNFSD_MOUNTPOINT=""
PROCNFSD_MOUNTOPTS=""
MOUNTD_OPTS="--no-nfs-version 1 --no-nfs-version 2"
NEED_SVCGSSD=""
SVCGSSD_OPTS=""
Client-Side:
/etc/fstab
192.168.0.1:/mnt/data /mnt/NFS nfs rsize=32768,wsize=32768,intr,noatime 0 0
Additional Infos:
NFS to the unencrypted /mnt/udata gives about 20MByte/s reading and 10MByte/s writing.
Internal Speed of the discs is about 37-38MByte/s reading/writing for the encrypted one, and 44-45MByte/s for the unencrypted (notebook-hdd)
I Noticed that the load average on the server goes over 10 while the CPU stays at 10-20%
So if anyone has any idea what might go wrong here please let me know. If you need more information I will gladly provide it.
TIA
seiichiro0185
Last edited by seiichiro0185 (2010-02-06 13:05:23)Your rsize and wsize looks way too big. I just use defaults and it runs fine.
I don't know what your server is but I plucked this from BSD Magazine.
There is one point worth mentioning here, modern Linux usually uses wsize and rsize 8192 by default and that can cause problems with BSD servers as many support only wsize and rsize 1024. I suggest you add the option -o wsize=1024,rsize=1024 when you mount the share on your Linux machines.
You also might want to check here for some optimisations http://www.linuxselfhelp.com/howtos/NFS … WTO-4.html
A trick to increase NFS write performance is to disable synchronous writes on the server. The NFS specification states that NFS write requests shall not be considered finished before the data written is on a non-volatile medium (normally the disk). This restricts the write performance somewhat, asynchronous writes will speed NFS writes up. The Linux nfsd has never done synchronous writes since the Linux file system implementation does not lend itself to this, but on non-Linux servers you can increase the performance this way with this in your exports file:
Last edited by sand_man (2010-03-03 00:23:23) -
Improving redo log writer performance
I have a database on RAC (2 nodes)
Oracle 10g
Linux 3
2 servers PowerEdge 2850
I'm tuning my database with "spotilght". I have alredy this alert
"The Average Redo Write Time alarm is activated when the time taken to write redo log entries exceeds a threshold. "
The serveres are not in RAID5.
How can I improve redo log writer performance?
Unlike most other Oracle write I/Os, Oracle sessions must wait for redo log writes to complete before they can continue processing.
Therefore, redo log devices should be placed on fast devices.
Most modern disks should be able to process a redo log write in less than 20 milliseconds, and often much lower.
To reduce redo write time see Improving redo log writer performance.
See Also:
Tuning Contention - Redo Log Files
Tuning Disk I/O - Archive WriterSome comments on the section that was pulled from Wikipedia. There is some confusion in the market as their are different types of solid state disks with different pros and cons. The first major point is that the quote pulled from Wikipedia addresses issues with Flash hard disk drives. Flash disks are one type of solid state disk that would be a bad solution for redo acceleration (as I will attempt to describe below) they could be useful for accelerating read intensive applications. The type of solid state disk used for redo logs use DDR RAM as the storage media. You may decide to discount my advice because I work with one of these SSD manufacturers but I think if you do enough research you will see the point. There are many articles and many more customers who have used SSD to accelerate Oracle.
> Assuming that you are not CPU constrained,
moving the online redo to
high-speed solid-state disk can make a hugedifference.
Do you honestly think this is practical and usable
advice Don? There is HUGE price difference between
SSD and and normal hard disks. Never mind the
following disadvantages. Quoting
(http://en.wikipedia.org/wiki/Solid_state_disk):[
i]
# Price - As of early 2007, flash memory prices are
still considerably higher
per gigabyte than those of comparable conventional
hard drives - around $10
per GB compared to about $0.25 for mechanical
drives.Comment: Prices for DDR RAM base systems are actually higher than this with a typical list price around $1000 per GB. Your concern, however, is not price per capacity but price for performance. How many spindles will you have to spread your redo log across to get the performance that you need? How much impact are the redo logs having on your RAID cache effectiveness? Our system is obviously geared to the enterprise where Oracle is supporting mission critical databases where a hugh return can be made on accelerating Oracle.
Capacity - The capacity of SSDs tends to be
significantly smaller than the
capacity of HDDs.Comment: This statement is true. Per hard disk drive versus per individual solid state disk system you can typically get higher density of storage with a hard disk drive. However, if your goal is redo log acceleration, storage capacity is not your bottleneck. Write performance, however, can be. Keep in mind, just as with any storage media you can deploy an array of solid state disks that provide terabytes of capacity (with either DDR or flash).
Lower recoverability - After mechanical failure the
data is completely lost as
the cell is destroyed, while if normal HDD suffers
mechanical failure the data
is often recoverable using expert help.Comment: If you lose a hard drive for your redo log, the last thing you are likely to do is to have a disk restoration company partially restore your data. You ought to be getting data from your mirror or RAID to rebuild the failed disk. Similarly, with solid state disks (flash or DDR) we recommend host based mirroring to provide enterprise levels of reliability. In our experience, a DDR based solid state disk has a failure rate equal to the odds of losing two hard disk drives in a RAID set.
Vulnerability against certain types of effects,
including abrupt power loss
(especially DRAM based SSDs), magnetic fields and
electric/static charges
compared to normal HDDs (which store the data inside
a Faraday cage).Comment: This statement is all FUD. For example, our DDR RAM based systems have redundant power supplies, N+1 redundant batteries, four RAID protected "hard disk drives" for data backup. The memory is ECC protected and Chipkill protected.
Slower than conventional disks on sequential I/OComment: Most Flash drives, will be slower on sequential I/O than a hard disk drive (to really understand this you should know there are different kinds of flash memory that also impact flash performance.) DDR RAM based systems, however, offer enormous performance benefits versus hard disk or flash based systems for sequential or random writes. DDR RAM systems can handle over 400,000 random write I/O's per second (the number is slightly higher for sequential access). We would be happy to share with you some Oracle ORION benchmark data to make the point. For redo logs on a heavily transactional system, the latency of the redo log storage can be the ultimate limit on the database.
Limited write cycles. Typical Flash storage will
typically wear out after
100,000-300,000 write cycles, while high endurance
Flash storage is often
marketed with endurance of 1-5 million write cycles
(many log files, file
allocation tables, and other commonly used parts of
the file system exceed
this over the lifetime of a computer). Special file
systems or firmware
designs can mitigate this problem by spreading
writes over the entire device,
rather than rewriting files in place.
Comment: This statement is mostly accurate but refers only to flash drives. DDR RAM based systems, such as those Don's books refer to, do not have this limitation.
>
Looking at many of your postings to Oracle Forums
thus far Don, it seems to me that you are less
interested in providing actual practical help, and
more interested in self-promotion - of your company
and the Oracle books produced by it.
.. and that is not a very nice approach when people
post real problems wanting real world practical
advice and suggestions.Comment: Contact us and we will see if we can prove to you that Don, and any number of other reputable Oracle consultants, recommend using DDR based solid state disk to solve redo log performance issues. In fact, if it looks like your system can see a serious performance increase, we would be happy to put you on our evaluation program to try it out so that you can do it at no cost from us. -
Question regarding DocumentDB RU consumption when inserting documents & write performance
Hi guys,
I do have some questions regarding the DocumentDB Public Preview capacity and performance quotas:
My use case is the following:
I need to store about 200.000.000 documents per day with a maximum of about 5000 inserts per second. Each document has a size of about 200 Byte.
According to to the documentation (http://azure.microsoft.com/en-us/documentation/articles/documentdb-manage/) i understand that i should be able to store about 500 documents per second with single inserts and about 1000 per second with a batch insert using
a stored procedure. This would result in the need of at least 5 CUs just to handle the inserts.
Since one CU consists of 2000 RUs i would expect the RU usage to be about 4 RUs per single document insert or 100 RUs for a single SP execution with 50 documents.
When i look at the actual RU consumption i get values i don’t really understand:
Batch insert of 50 documents: about 770 RUs
Single insert: about 17 RUs
Example document:
{"id":"5ac00fa102634297ac7ae897207980ce","Type":0,"h":"13F40E809EF7E64A8B7A164E67657C1940464723","aid":4655,"pid":203506,"sf":202641580,"sfx":5662192,"t":"2014-10-22T02:10:34+02:00","qg":3}
The consistency level is set to “Session”.
I am using the SP from the example c# project for batch inserts and the following code snippet for single inserts:
await client.CreateDocumentAsync(documentCollection.DocumentsLink, record);
Is there any flaw in my assumption (ok…obviously) regarding the throughput calculation or could you give me some advice how to achieve the throughput stated in the documentation?
With the current performance i would need to buy at least 40 CUs which wouldn’t be an option at all.
I have another question regarding document retention:
Since i would need to store a lot of data per day i also would need to delete as much data per day as i insert:
The data is valid for at least 7 days (it actually should be 30 days, depending on my options with documentdb).
I guess there is nothing like a retention policy for documents (this document is valid for X day and will automatically be deleted after that period)?
Since i guess deleting data on a single document basis is no option at all i would like to create a document collection per day and delete the collection after a specified retention period.
Those historic collections would never change but would only receive queries. The only problem i see with creating collections per day is the missing throughput:
As i understand the throughput is split equally according to the number of available collections which would result in “missing” throughput on the actual hot collection (hot meaning, the only collection i would actually insert documents).
Is there any (better) way to handle this use case than buy enough CUs so that the actual hot collection would get the needed throughput?
Example:
1 CU -> 2000 RUs
7 collections -> 2000 / 7 = 286 RUs per collection (per CU)
Needed throughput for hot collection (values from documentation): 20.000
=> 70 CUs (20.000 / 286)
vs. 10 CUs when using one collection and batch inserts or 20 CUs when using one collection and single inserts.
I know that DocumentDB is currently in preview and that it is not possible to handle this use case as is because of the limit of 10 GB per collection at the moment. I am just trying to do a POC to switch to DocumentDB when it is publicly available.
Could you give me any advice if this kind of use case can be handled or should be handled with documentdb? I currently use Table Storage for this case (currently with a maximum of about 2500 inserts per second) but would like to switch to documentdb since i
had to optimize for writes per second with table storage and do have horrible query execution times with table storage because of full table scans.
Once again my desired setup:
200.000.000 inserts per day / Maximum of 5000 writes per second
Collection 1.2 -> Hot Collection: All writes (max 5000 p/s) will go to this collection. Will also be queried.
Collection 2.2 -> Historic data, will only be queried; no inserts
Collection 3.2 -> Historic data, will only be queried; no inserts
Collection 4.2 -> Historic data, will only be queried; no inserts
Collection 5.2 -> Historic data, will only be queried; no inserts
Collection 6.2 -> Historic data, will only be queried; no inserts
Collection 7.2 -> Historic data, will only be queried; no inserts
Collection 1.1 -> Old, so delete whole collection
As a matter of fact the perfect setup would be to have only one (huge) collection with an automatic document retention…but i guess this won’t be an option at all?
I hope you understand my problem and give me some advice if this is at all possible or will be possible in the future with documentdb.
Best regards and thanks for your helpHi Aravind,
first of all thanks for your reply regarding my questions.
I sent you a mail a few days ago but since i did not receive a response i am not sure it got through.
My main question regarding the actual usage of RUs when inserting documents is still my main concern since i can not insert nearly
as many documents as expected per second and CU.
According to to the documentation (http://azure.microsoft.com/en-us/documentation/articles/documentdb-manage/)
i understand that i should be able to store about 500 documents per second with single inserts and about 1000 per second with a batch insert using a stored procedure (20 batches per second containing 50 documents each).
As described in my post the actual usage is multiple (actually 6-7) times higher than expected…even when running the C# examples
provided at:
https://code.msdn.microsoft.com/windowsazure/Azure-DocumentDB-NET-Code-6b3da8af/view/SourceCode
I tried all ideas Steve posted (manual indexing & lazy indexing mode) but was not able to enhance RU consumption to a point
that 500 inserts per second where nearly possible.
Here again my findings regarding RU consumption for batch inserts:
Automatic indexing on: 777
RUs for 50 documents
Automatic indexing off &
mandatory path only: 655
RUs for 50 documents
Automatic indexing off & IndexingMode Lazy & mandatory path only: 645 RUs for
50 documents
Expected result: approximately 100
RUs (2000 RUs => 20x Batch insert of 50 => 100 RUs per batch)
Since DocumentDB is still Preview i understand that it is not yet capable to handle my use case regarding throughput, collection
size, amount of collections and possible CUs and i am fine with that.
If i am able to (at least nearly) reach the stated performance of 500 inserts per second per CU i am totally fine for now. If not
i have to move on and look for other options…which would also be “fine”. ;-)
Is there actually any working example code that actually manages to do 500 single inserts per second with one CUs 2000 RUs or is
this a totally theoretical value? Or is it just because of being Preview and the stated values are planned to work.
Regarding your feedback:
...another thing to consider
is if you can amortize the request rate over the average of 200 M requests/day = 2000 requests/second, then you'll need to provision 16 capacity units instead of 40 capacity units. You can do this by catching "RequestRateTooLargeExceptions" and retrying
after the server specified retry interval…
Sadly this is not possible for me because i have to query the data in near real time for my use case…so queuing is not
an option.
We don't support a way to distribute throughput differently across hot and cold
collections. We are evaluating a few solutions to enable this scenario, so please do propose as a feature at http://feedback.azure.com/forums/263030-documentdb as this helps us prioritize
feature work. Currently, the best way to achieve this is to create multiple collections for hot data, and shard across them, so that you get more proportionate throughput allocated to it.
I guess i could circumvent this by not clustering in “hot" and “cold" collections but “hot" and “cold"
databases with one or multiple collections (if 10GB will remain the limit per collection) each if there was a way to (automatically?) scale the CUs via an API. Otherwise i would have to manually scale down the DBs holding historic data. I
also added a feature requests as proposed by you.
Sorry for the long post but i am planning the future architecture for one of our core systems and want to be sure if i am on
the right track.
So if you would be able to answer just one question this would be:
How to achieve the stated throughput of 500 single inserts per second with one CUs 2000 RUs in reality? ;-)
Best regards and thanks again -
How to improve the write performance of the database
Our application is a write intense application, maybe will write 2M/second data to the database, how to improve the performance of the database? We mainly write to 5 tables of the database.
Currently, the database get no response and the CPU is 100% used.
How to tuning this? thanks in advance.Your post says more by what is not provided than by what is provided. The following is the minimum list of information needed to even begin to help you.
1. What hardware (server, CPU, RAM, and NIC and HBA cards if any pointing to storage).
2. Storage solution (DAS, iSCSCI, SAN, NAS). Provide manufacturer and model.
3. If RAID which implementation of RAID and on how many disks.
4. If NAS or SAN how is the read-write cache configured.
5. What version of Oracle software ... all decimal points ... for example 11.1.0.6. If you are not fully patched then patch it and try again before asking for help.
6. What, in addition to the Oracle database, is running on the server?
2MB/sec. is very little. That is equivalent to inserting 500 VARCHAR2(4000)s. If I couldn't do 500 inserts per second on my laptop I'd trade it in.
SQL> create table t (
2 testcol varchar2(4000));
Table created.
SQL> set timing on
SQL> BEGIN
2 FOR i IN 1..500 LOOP
3 INSERT INTO t SELECT RPAD('X', 3999, 'X') FROM dual;
4 END LOOP;
5 END;
6 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.07
SQL>Now what to do with the remaining 0.93 seconds. <g> And this was on a T61 Lenovo with a slow little 7500RPM drive and 4GB RAM running Oracle Database 11.2.0.1. But I will gladly repeat it using any currently supported version of the product. -
Optimal read write performance for data with duplicate keys
Hi,
I am constructing a database that will store data with duplicate keys.
For each key (a String) there will be multiple data objects, there is no upper limit to the number of data objects, but let's say there could be a million.
Data objects have a time-stamp (Long) field and a message (String) field.
At the moment I write these data objects into the database in chronological order, as i receive them, for any given key.
When I retrieve data for a key, and iterate across the duplicates for any given primary key using a cursor they are fetched in ascending chronological order.
What I would like to do is start fetching these records in reverse order, say just the last 10 records that were written to the database for a given key, and was wondering if anyone had some suggestions on the optimal way to do this.
I have considered writing data out in the order that i want to retrieve it, by supplying the database with a custom duplicate comparator. If I were to do this then the query above would return the latest data first, and I would be able to iterate over the most recent inserts quickly. but Is there a performance penalty paid on writing to the database if I do this?
I have also considered using the time-stamp field as the unique primary key for the primary database instead of the String, and creating a secondary database for the String, this would allow me to index into the data using a cursor join, but I'm not certain it would be any more performant, at least not on writing to the database, since it would result in a very flat b-tree.
Is there a fundamental choice that I will have to make between write versus read performance? Any suggestions on tackling this much appreciated.
Many Thanks,
JoelHi Joel,
Using a duplicate comparator will slow down Btree access (writes and reads) to
some degree because the comparator is called a lot during searching. But
whether this is a problem depends on whether your app is CPU bound and how much
CPU time your comparator uses. If you can avoid de-serializing the object in
the comparator, that will help. For example, if you keep the timestamp at the
beginning of the data and only read the one long timestamp field in your
comparator, that should be pretty fast.
Another approach is to store the negation of the timestamp so that records
are sorted naturally in reverse timestamp order.
Another approach is to read backwards using a cursor. This takes a couple
steps:
1) Find the last duplicate for the primary key you're interested in:
cursor.getSearchKey(keyOfInterest, ...)
status = cursor.getNextNoDup(...)
if (status == SUCCESS) {
// Found the next primary key, now back up one record.
status = cursor.getPrev(...)
} else {
// This is the last primary key, find the last record.
status = cursor.getLast(...)
}2) Scan backwards over the duplicates:
while (status == SUCCESS) {
// Process one record
// Move backwards
status = cursor.getPrev(...)
}Finally another approach is to use a two-part primary key: {string,timestamp}.
Duplicates are not configured because every key is unique. I mention this
because using duplicates in JE has more overhead than using a unique primary
key. You can combine this with either of the above approaches -- using a
comparator, negating the timestamp, or scanning backwards.
--mark -
How to write - Perform using variable changing tables
hi Gurus,
I am facing an issue while writing a perform statement in my code.
PERFORM get_pricing(zvbeln) USING nast-objky
CHANGING gt_komv
gt_vbap
gt_komp
gt_komk.
in program zvbeln :-
FORM get_pricing USING p_nast_objky TYPE nast-objky
tables p_gt_komv type table komv
p_gt_vbap type table vbapvb
p_gt_komp type table komp
p_gt_komk type table komk.
BREAK-POINT.
DATA: lv_vbeln TYPE vbak-vbeln.
MOVE : p_nast_objky TO lv_vbeln.
CALL FUNCTION '/SAPHT/DRM_ORDER_PRC_READ'
EXPORTING
iv_vbeln = lv_vbeln
TABLES
et_komv = p_gt_komv
et_vbap = p_gt_vbap
et_komp = p_gt_komp
et_komk = p_gt_komk.
ENDFORM. " GET_PRICING
But its giving an error . please let me know how i can solve this .Hi,
Please incorporate these changes and try.
perform get_pricing(zvbeln) TABLES gt_komv gt_vbap gt_komp gt_komk
USING nast-obky.
in program zvblen.
Form get_pricing TABLES p_gt_komv type table komv
p_gt_vbap type table vbapvb
p_gt_komp type table komp
p_gt_komk type table komk
USING p_nast_objky TYPE nast-objky.
REST OF THE CODE SAME.
End form.
Note : Please check lv_vbeln after the move statement.
Hope this will help you.
Regards,
Smart Varghese -
if I want to use the fast read/write speeds of my SSD, do I have to save the programs (Final Cut Pro X, Adobe After Effects, Microsoft Powerpoint, Steam, Counter-Strike, etc) onto the SSD? Or can I save all programs onto a 2nd drive which would be a HDD and still have fast speeds WITHOUT storing those programs onto the SSD? My 2012 mid Macbook Pro has a SSD as its main bootup drive and will be getting a 2tb HDD to replace the optical drive.
Any advice or comments is greatly appreciated,
Thank you!Sorry for my late reply.
Techinspace I'm with the latest Yosemite 10.10.2 and everything always up to date.
babowa My login Items are two: ColorNavigator 6 and cDock Agent. One is for the color calibration the other is Dock customization app.
Have AdwareMedic but it is not auto starting program. Also have Kaspersky Internet Security because my system is dual boot and need to keep the Windows drive clean. Nothing else.
Verifyed and checked the disks permissions and found one problem there:
Warning: SUID file "System/Library/CoreServices/RemoteManegement/ARDAgent.app/Content/MacOS/ARDAge nt" has been modified and will not be repaired.
Apple advise for this message to be ignored: Mac OS X: Disk Utility's Repair Disk Permissions messages that you can safely ignore - Apple Support
EtreCheck version: 2.1.8 (121)
Hardware Information: ℹ️
MacBook Pro (Retina, 15-inch, Mid 2014) (Technical Specifications)
MacBook Pro - model: MacBookPro11,3
1 2.8 GHz Intel Core i7 CPU: 4-core
16 GB RAM Not upgradeable
BANK 0/DIMM0
8 GB DDR3 1600 MHz ok
BANK 1/DIMM0
8 GB DDR3 1600 MHz ok
Bluetooth: Good - Handoff/Airdrop2 supported
Wireless: en0: 802.11 a/b/g/n/ac
Battery Health: Normal - Cycle count 138
Video Information: ℹ️
Intel Iris Pro
NVIDIA GeForce GT 750M - VRAM: 2048 MB
Color LCD spdisplays_2880x1800Retina
CX240 1920 x 1200
System Software: ℹ️
OS X 10.10.2 (14C1514) - Time since boot: 0:34:49
Disk Information: ℹ️
APPLE SSD SM1024F disk0 : (1 TB)
EFI (disk0s1) <not mounted> : 210 MB
Macintosh HD (disk0s2) / : 698.70 GB (224.98 GB free)
Recovery HD (disk0s3) <not mounted> [Recovery]: 650 MB
BOOTCAMP (disk0s4) /Volumes/BOOTCAMP : 301.00 GB (122.94 GB free)
USB Information: ℹ️
Apple Internal Memory Card Reader
VIA Labs, Inc. USB3.0 Hub
VIA Labs, Inc. USB3.0 Hub
Western Digital Elements 1048 2 TB
HFS (disk1s1) /Volumes/HFS : 1.32 TB (488.20 GB free)
XFat (disk1s2) /Volumes/XFat : 679.86 GB (178.77 GB free)
VIA Labs, Inc. USB3.0 Hub
Apple Inc. BRCM20702 Hub
Apple Inc. Bluetooth USB Host Controller
Apple Inc. Apple Internal Keyboard / Trackpad
VIA Labs, Inc. USB2.0 Hub
Logitech USB Keyboard
VIA Labs, Inc. USB2.0 Hub
EIZO EIZO USB HID Monitor
VIA Labs, Inc. USB2.0 Hub
Datacolor Datacolor Spyder4
©Microsoft Corporation Controller
Tablet PTK-440
Thunderbolt Information: ℹ️
Apple Inc. thunderbolt_bus
Configuration files: ℹ️
/etc/hosts - Count: 22
Gatekeeper: ℹ️
Mac App Store and identified developers
Kernel Extensions: ℹ️
/Applications/VMware Fusion.app
[not loaded] com.vmware.kext.vmci (90.5.7) [Click for support]
[not loaded] com.vmware.kext.vmioplug.12.1.17 (12.1.17) [Click for support]
[not loaded] com.vmware.kext.vmnet (0188.79.83) [Click for support]
[not loaded] com.vmware.kext.vmx86 (0188.79.83) [Click for support]
[not loaded] com.vmware.kext.vsockets (90.5.7) [Click for support]
/Library/Application Support/Kaspersky Lab/KAV/Bases/Cache
[loaded] com.kaspersky.kext.kimul.44 (44) [Click for support]
[loaded] com.kaspersky.kext.mark.1.0.5 (1.0.5) [Click for support]
/Library/Extensions
[not loaded] com.Logitech.Control Center.HID Driver (3.9.1 - SDK 10.8) [Click for support]
[loaded] com.kaspersky.kext.klif (3.0.5a45) [Click for support]
[loaded] com.kaspersky.nke (2.0.0a12) [Click for support]
/System/Library/Extensions
[not loaded] com.Logitech.Unifying.HID Driver (1.3.0 - SDK 10.6) [Click for support]
[not loaded] com.basICColor.driver.basICColorDISCUS (1.0.0 - SDK 10.4) [Click for support]
[loaded] com.mice.driver.Xbox360Controller (1.0.0d13 - SDK 10.8) [Click for support]
[loaded] com.nvidia.CUDA (1.1.0) [Click for support]
[not loaded] com.wacom.kext.wacomtablet (6.3.8 - SDK 10.9) [Click for support]
/System/Library/Extensions/360Controller.kext/Contents/PlugIns
[not loaded] com.mice.driver.Wireless360Controller (1.0.0d13 - SDK 10.8) [Click for support]
[not loaded] com.mice.driver.WirelessGamingReceiver (1.0.0d13 - SDK 10.8) [Click for support]
Startup Items: ℹ️
TuxeraNTFSUnmountHelper: Path: /Library/StartupItems/TuxeraNTFSUnmountHelper
Startup items are obsolete in OS X Yosemite
Launch Agents: ℹ️
[not loaded] com.adobe.AAM.Updater-1.0.plist [Click for support]
[loaded] com.adobe.AD4ServiceManager.plist [Click for support]
[running] com.kaspersky.kav.gui.plist [Click for support]
[loaded] com.nvidia.CUDASoftwareUpdate.plist [Click for support]
[running] com.wacom.wacomtablet.plist [Click for support]
Launch Daemons: ℹ️
[loaded] com.adobe.fpsaud.plist [Click for support]
[running] com.bombich.ccchelper.plist [Click for support]
[running] com.kaspersky.kav.plist [Click for support]
[running] com.mice.360Daemon.plist [Click for support]
[loaded] com.nvidia.cuda.launcher.plist [Click for support]
User Launch Agents: ℹ️
[loaded] com.adobe.AAM.Updater-1.0.plist [Click for support]
[loaded] com.google.keystone.agent.plist [Click for support]
User Login Items: ℹ️
ColorNavigator 6 Application (/Applications/ColorNavigator 6.app)
cDock Agent Application (/Applications/cDock.app/Contents/Resources/helpers/cDock Agent.app)
Internet Plug-ins: ℹ️
AdobeAAMDetect: Version: AdobeAAMDetect 1.0.0.0 - SDK 10.6 [Click for support]
FlashPlayer-10.6: Version: 16.0.0.305 - SDK 10.6 [Click for support]
QuickTime Plugin: Version: 7.7.3
Flash Player: Version: 16.0.0.305 - SDK 10.6 Outdated! Update
Default Browser: Version: 600 - SDK 10.10
Silverlight: Version: 5.1.30514.0 - SDK 10.6 [Click for support]
WacomTabletPlugin: Version: WacomTabletPlugin 2.1.0.6 - SDK 10.9 [Click for support]
JavaAppletPlugin: Version: 15.0.0 - SDK 10.10 Check version
Safari Extensions: ℹ️
Virtual Keyboard
URL Advisor
3rd Party Preference Panes: ℹ️
CUDA Preferences [Click for support]
Flash Player [Click for support]
Tuxera NTFS [Click for support]
WacomTablet [Click for support]
XBox 360 Controllers [Click for support]
Time Machine: ℹ️
Skip System Files: NO
Mobile backups: ON
Auto backup: YES
Volumes being backed up:
Macintosh HD: Disk size: 698.70 GB Disk used: 473.72 GB
Destinations:
Os X [Local]
Total size: 1.32 TB
Total number of backups: 39
Oldest backup: 2015-01-27 20:51:11 +0000
Last backup: 2015-03-31 01:29:52 +0000
Size of backup disk: Too small
Backup size 1.32 TB < (Disk used 473.72 GB X 3)
Top Processes by CPU: ℹ️
4% backupd
4% firefox
4% WindowServer
2% VLC
1% coreaudiod
Top Processes by Memory: ℹ️
464 MB firefox
344 MB kav
223 MB Dock
172 MB mds_stores
155 MB WindowServer
Virtual Memory Information: ℹ️
10.93 GB Free RAM
3.24 GB Active RAM
1.20 GB Inactive RAM
1.79 GB Wired RAM
3.46 GB Page-ins
0 B Page-outs
Diagnostics Information: ℹ️
Mar 31, 2015, 11:33:40 PM Self test - passed -
Oracle 9i - my application report writer performing reports very slow..
Hi,
To tune the performance, where to find the pfile? location?
What kind of tuning should be done & how?
Please advice...
rgd
frzHi,
Use the Documentation forum to report broken links or general feedback about Oracle documentation.
For better/faster response, please post your question in the Database forum.
Database
http://forums.oracle.com/forums/category.jspa?categoryID=18
Regards,
Hussein -
I am writing log data to a TDMS file. We log one sample for each of 16 channels at the beginning and end of each step in a test profile and every minute in between.
The profile has 640 steps, most shorter than one minute. This results in approximately 400,000 writes to the TDMS file with only one data point per channel. As a result the file is very fragmented and takes a long time to open or defragment.
In this post Brad Turpin mentions a fix that works well but greatly diminished the TDMS write performance.
http://forums.ni.com/ni/board/message?board.id=170&message.id=403179&query.id=7209265#M403179
I also found that it takes about 40 seconds to set the NI_MinimumBufferSize attribute on 10,240 channels. (640 groups * 16 channels)
I did test this and it works very well but it took hours to generate a file of dummy data using this method. Generating the dummy data file with the same number of writes but not using the buffer size attribute took seconds.
In this post Brad also mentioned that LV 2009 contains TDMS VIs with an updated option to create a single binary header for the entire file.
I have not been able to find any more references to this nor have I found the attribute to set this functionality.
Does anybody know how to set this attribute or have any suggestions on how to better deal with my file structure?
Thanks,
DaveAre you writing one value per channel for all 16 channels with a single call to TDMS Write, or are you calling TDMS Write 16 times for that? Consolidating this into one call to TDMS Write should improve performamce and fragmentation quite a bit. In addition to that, you could set NI_MinimumBufferSize to the maximum number of values a channel in a step can have. After each step you could call TDMS Flush in order for these buffers to be flushed to disk. That should further reduce fragmentation.
The feature Brad mentioned is used in 2009 automatically, you don't need to enable it. Unfortunately, that won't do much for your application, because you create new channels for every step, which results in a new binary header for every step. The 2009 improvements would only kick in if you would use the same channels all the way through.
We are currently working on some improvements to the TDMS API that will help making your use case a lot more efficient. These are not yet available to customers though, so I'll describe some ways of working around the issue. It's not going to be pretty, but it'll work.
1) The TDM Streaming API is built for high performance, but even more than that, it is built so whatever you do with it, it will always create a valid TDMS file. These safety measures come at a cost, especially if you have a large number of channels and/or properties versus a rather small number of values per channel. In order to better address use cases like that for the time being, we have published a set of VIs a while ago, which will write TDMS files based on LabVIEW File I/O functions. This API does a lot less data processing in the background than the built-in API and therefore is a lot more efficient tackling use cases with thousands of channels.
2) Another way of improving performance during your test steps is to push some of the tasks that cost a lot of performance out into a post processing step. You can merge multiple TDMS files by concatenating them on a file system level. A possible workaround would be to write one TDMS file per step or maybe one every 10 or 100 steps and merge them after the test is done. An example VI for concatenating TDMS files can be found here.
Hope that helps,
Herbert
Message Edited by Herbert Engels on 05-12-2010 11:33 AM -
Performance problems when running PostgreSQL on ZFS and tomcat
Hi all,
I need help with some analysis and problem solution related to the below case.
The long story:
I'm running into some massive performance problems on two 8-way HP ProLiant DL385 G5 severs with 14 GB ram and a ZFS storage pool in raidz configuration. The servers are running Solaris 10 x86 10/09.
The configuration between the two is pretty much the same and the problem therefore seems generic for the setup.
Within a non-global zone Im running a tomcat application (an institutional repository) connecting via localhost to a Postgresql database (the OS provided version). The processor load is typically not very high as seen below:
NPROC USERNAME SWAP RSS MEMORY TIME CPU
49 postgres 749M 669M 4,7% 7:14:38 13%
1 jboss 2519M 2536M 18% 50:36:40 5,9%We are not 100% sure why we run into performance problems, but when it happens we experience that the application slows down and swaps out (according to below). When it settles everything seems to turn back to normal. When the problem is acute the application is totally unresponsive.
NPROC USERNAME SWAP RSS MEMORY TIME CPU
1 jboss 3104M 913M 6,4% 0:22:48 0,1%
#sar -g 5 5
SunOS vbn-back 5.10 Generic_142901-03 i86pc 05/28/2010
07:49:08 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
07:49:13 27.67 316.01 318.58 14854.15 0.00
07:49:18 61.58 664.75 668.51 43377.43 0.00
07:49:23 122.02 1214.09 1222.22 32618.65 0.00
07:49:28 121.19 1052.28 1065.94 5000.59 0.00
07:49:33 54.37 572.82 583.33 2553.77 0.00
Average 77.34 763.71 771.43 19680.67 0.00Making more memory available to tomcat seemed to worsen the problem or at least didnt prove to have any positive effect.
My suspicion is currently focused on PostgreSQL. Turning off fsync boosted performance and made the problem less often to appear.
An unofficial performance evaluation on the database with vacuum analyze took 19 minutes on the server and only 1 minute on a desktop pc. This is horrific when taking the hardware into consideration.
The short story:
Im trying different steps but running out of ideas. Weve read that the database block size and file system block size should match. PostgreSQL is 8 Kb and ZFS is 128 Kb. I didnt find much information on the matter so if any can help please recommend how to make this change
Any other recommendations and ideas we could follow? We know from other installations that the above setup runs without a single problem on Linux on much smaller hardware without specific tuning. What makes Solaris in this configuration so darn slow?
Any help appreciated and I will try to provide additional information on request if needed
Thanks in advance,
Kasperraidz isnt a good match for databases. Databases tend to require good write performance for which mirroring works better.
Adding a pair of SSD's as a ZIL would probably also help, but chances are its not an option for you..
You can change the record size by "zfs set recordsize=8k <dataset>"
It will only take effect for newly written data. Not existing data. -
Has anyone else seen Windows Server 2012 Storage Spaces with a Simple RAID 0 (also happens with Mirrored RAID 1 and Parity RAID 5) virtual disk exhibiting extremely slow read speed of 5Mb/sec, yet write performance is normal at 650Mb/sec in RAID 0?
Windows Server 2012 Standard
Intel i7 CPU and Motherboard
LSI 9207-8e 6Gb SAS JBOD Controller with latest firmware/BIOS and Windows driver.
(4) Hitachi 4TB 6Gb SATA Enterprise Hard Disk Drives HUS724040ALE640
(4) Hitachi 4TB 6Gb SATA Desktop Hard Disk Drives HDS724040ALE640
Hitachi drives are directly connected to LSI 9207-8e using a 2-meter SAS SFF-8088 to eSATA cable to six-inch eSATA/SATA adapter.
The Enterprise drives are on LSI's compatibility list. The Desktop drives are not, but regardless, both drive models are affected by the problem.
Interestingly, this entire configuration but with two SIIG eSATA 2-Port adapters instead of the LSI 9207-8e, works perfectly with both reads and writes at 670Mb/sec.
I thought SAS was going to be a sure bet for expanding beyond the capacity of port limited eSATA adapters, but after a week of frustration and spending over $5,000.00 on drives, controllers and cabling, it's time to ask for help!
Any similar experiences or solutions?Has anyone else seen Windows Server 2012 Storage Spaces with a Simple RAID 0 (also happens with Mirrored RAID 1 and Parity RAID 5) virtual disk exhibiting extremely slow read speed of 5Mb/sec, yet write performance is normal at 650Mb/sec in RAID 0?
Windows Server 2012 Standard
Intel i7 CPU and Motherboard
LSI 9207-8e 6Gb SAS JBOD Controller with latest firmware/BIOS and Windows driver.
(4) Hitachi 4TB 6Gb SATA Enterprise Hard Disk Drives HUS724040ALE640
(4) Hitachi 4TB 6Gb SATA Desktop Hard Disk Drives HDS724040ALE640
Hitachi drives are directly connected to LSI 9207-8e using a 2-meter SAS SFF-8088 to eSATA cable to six-inch eSATA/SATA adapter.
The Enterprise drives are on LSI's compatibility list. The Desktop drives are not, but regardless, both drive models are affected by the problem.
Interestingly, this entire configuration but with two SIIG eSATA 2-Port adapters instead of the LSI 9207-8e, works perfectly with both reads and writes at 670Mb/sec.
I thought SAS was going to be a sure bet for expanding beyond the capacity of port limited eSATA adapters, but after a week of frustration and spending over $5,000.00 on drives, controllers and cabling, it's time to ask for help!
Any similar experiences or solutions?
1) Yes, being slow either on reads or on writes is a quite common situation for storage spaces. See references (with some of the solutions I hope):
http://social.technet.microsoft.com/Forums/en-US/winserverfiles/thread/a58f8fce-de45-4032-a3ef-f825ee39b96e/
http://blogs.technet.com/b/askpfeplat/archive/2012/10/10/windows-server-2012-storage-spaces-is-it-for-you-could-be.aspx
http://social.technet.microsoft.com/Forums/en-US/winserver8gen/thread/64aff15f-2e34-40c6-a873-2e0da5a355d2/
and this one is my favorite putting a lot of light on the issue:
http://helgeklein.com/blog/2012/03/windows-8-storage-spaces-bugs-and-design-flaws/
2) Issues with SATA-to-SAS hardware is also very common. See:
http://social.technet.microsoft.com/Forums/en-US/winserverClustering/thread/5d4f68b7-5fc4-4a3c-8232-a2a68bf3e6d2
StarWind iSCSI SAN & NAS -
Hyper-V over SMB 3.0 poor performance on 1GB NIC's without RDMA
This is a bit of a repost as the last time I tried to troubleshoot this my question got hijacked by people spamming alternative solutions (starwind)
For my own reasons I am currently evaluating Hyper-V over SMB with a view to designing our new production cluster based on this technology. Given our budget and resources a SoFS makes perfect sense.
The problem I have is that in all my testing, as soon as I host a VM's files on a SMB 3.0 server (SoFS or standalone) I am not getting the performance I should over the network.
My testing so far:
4 different decent spec machines with 4-8gb ram, dual/quad core cpu's,
Test machines are mostly Server 2012 R2 with one Windows 8.1 hyper-v host thrown in for extra measure.
Storage is a variety of HD and SSD and are easily capable of handling >100MB/s of traffic and 5k+ IOPS
Have tested storage configurations as standalone, storage spaces (mirrored, spanned and with tiering)
All storage is performing as expected in each configuration.
Multiple 1GB NIC's from broadcom, intel and atheros. The broadcoms are server grade dual port adapters.
Switching has been a combination of HP E5400zl, HP 2810 and even direct connect with crossover cables.
Have tried stand alone NIC's, teamed NIC's and even storage through hyper-v extensible switch.
File copies between machines will easily max out 1GB in any direction.
VM's hosted locally show internal benchmark performance in line with roughly 90% of underlying storage performance.
Tested with dynamic and fixed vhdx's
NIC's have been used in combinations of RSS and TCP offload enabled/disabled.
Whenever I host VM files on a different server from where it is running, I observe the following:
Write speeds within the VM to any attached vhd's are severely effected and run at around 30-50% of 1GB
Read Speeds are not as badly effected but just about manager to hit 70% of 1GB
Random IOPS are not noticeably affected.
Running multiple tests at the same time over the same 1GB links results in the same total through put.
The same results are observed no matter which machine hosts the vm or the vhdx files.
Any host involved in a test will show a healthy amount of cpu time allocated to hardware interupts. On a 6 core 3.8Ghz cpu this is around 5% of total. On the slowest machine (dual core 2.4Ghz) this is roughly 30% of cpu load.
Things I have yet to test:
Gen 1 VM's
VM's running anything other than server 2012 r2
Running the tests on actual server hardware. (hard as most of ours are in production use)
Is there a default QoS or IOPS limit when SMB detects hyper-v traffic? I just can't wrap my head around how all the tests are seeing an identical bottleneck as soon as the storage traffic goes over smb.
What else should I be looking for? There must be something obvious that I am overlooking!By nature of a SOFS reads are really good, but there is no write cache, SOFS only seems to perform well with Disk mirroring, this improves the write performance and redundancy but halves your disk capacity.
Mirror (RAID1 or RAID10) actually REDUCES number of IOPS. With read every spindle takes part in I/O request processing (assumimg I/O is big enough to cover the stripe) so you multiply IOPS and MBps on amount of spindles you have in a RAID group and all writes
need to go to the duplicated locations that's why READS are fast and WRITES are slow (1/2 of the read performance). This is absolutely basic thing and SoFS layered on top can do nothing to change this.
StarWind iSCSI SAN & NAS
Not wanting to put the cat amongst the pigeons, this isn't strictly true, RAID 1 and 10 give you the best IOP performance of any Raid group, this is why all the best performing SQL Cluster use RAID 10 for most of their storage requirements,
Features
RAID 0
RAID 1
RAID 1E
RAID 5
RAID 5EE
Minimum # Drives
2
2
3
3
4
Data Protection
No Protection
Single-drive
failure
Single-drive
failure
Single-drive
failure
Single-drive
failure
Read Performance
High
High
High
High
High
Write Performance
High
Medium
Medium
Low
Low
Read Performance (degraded)
N/A
Medium
High
Low
Low
Write Performance (degraded)
N/A
High
High
Low
Low
Capacity Utilization
100%
50%
50%
67% - 94%
50% - 88%
Typical Applications
High End Workstations, data
logging, real-time rendering, very transitory data
Operating System, transaction
databases
Operating system, transaction
databases
Data warehousing, web serving,
archiving
Data warehousing, web serving,
archiving
Maybe you are looking for
-
Is there any possible to display the text in big font in screen painter
HI all! i want to display the text in the large size in the text field in the screen painter in the odule pool programing.Is there any possible.What i have to do in the screen painter to display the text in the large size.Also is there any possible t
-
How to keep all developments in one package
Hi I have a requirement In my company one new training server was installed for which all developments should be copied from other servers client. For this i have to keep all developments in one package. Can anyone suggest me how to do this urgent pl
-
I was an AT&T customer for over 9 years. Our area is very rural and they had been promising to get a stronger signal in our area for a long time. My husband finally said; "Why are we paying for services we are not actually getting. Lets switch t
-
Missing points, were removed from purchase history
HI everyone. Recently i purchased a playstation 4 console as part of a bundle (in-store). In that same transaction i purchase an extra game. to be exactly the purchase was made on the 10/18/14. i paid with my best buy credit card, in addition, i sca
-
E61, Lotus Notes 7 and Windows authentification = ...
Hi, Using - Windows XP SP2 - connexion via bluetooth USB key - Nokia PC Suite 7.0.9.2 - Lotus Notes 7 - Windows authentification for Lotus Notes The link is not an issue because I can use PC Suite, edit my contacts, send SMS,... The problem is relate