Distributed Cache Host Down

when i start the cache host on the server with the below command
Start-CacheHost -Computername "name" -CachePort 22233
it immediately displays
HostName : CachePort Service Name Service Status Version Info
name:22233 AppFabricCachingService DOWN 3 [3,3][1,3]
when i checked the logs i see
AppFabricCachingService.Crash
Param System.UriFormatException: Invalid URI: The hostname could not be parsed. at System.Uri.CreateThis
I am able to ping the hostname.
Can somebody point what is missing?

I tried running the command
cmdlet Get-AFCacheHostConfiguration at command pipeline position 1
Supply values for the following parameters:
ComputerName: ABC1
CachePort: 22233
HostName : ABC1
ClusterPort : 22234
CachePort : 22233
ArbitrationPort : 22235
ReplicationPort : 22236
Size : 819 MB
ServiceName : AppFabricCachingService
HighWatermark : 99%
LowWatermark : 90%
IsLeadHost : True
Its running , But when i run
Start-CacheHost -Computername "abc1" -CachePort 22233
it immediately displays
HostName : CachePort Service Name Service Status Version Info
abc1:22233 AppFabricCachingService DOWN
3 [3,3][1,3]

Similar Messages

How to know (cmdlet) If my Distributed Cache hosts belong to the same Cluster or not ?

Forum,
Our Farm has two servers that are hosting and running the Distributed Cache service. How can I know if both servers/hosts belong to the exact same Cluster? What is the command for that?

hi,
you can take help of the below articles it has list of powershell command to provide details of each host inside cluster
http://almondlabs.com/blog/manage-the-distributed-cache/
Whenever you see a reply and if you think is helpful,Vote As Helpful! And whenever you see a reply being an answer to the question of the thread, click Mark As Answer

Distributed cache

HI,
We have a server (Server 1), on which the status of the Distributed cache was in "Error Starting" state.
While applying a service pack due to some issue we were unable to apply the path (Server 1) so we decided to remove the effected server from the farm and work on it. the effected server (Server 1) was removed from the farm through the configuration wizard.
Even after running the configuration wizard we were still able to see the server (Server 1) on the SharePoint central admin site (Servers in farm) when clicked, the service "Distributed cache" was still visible with a status "Error Starting",
tried deleting the server from the farm and got an error message, the ULS logs displayed the below.
A failure occurred in SPDistributedCacheServiceInstance::UnprovisionInternal. cacheHostInfo is null for host 'servername'.
8130ae9c-e52e-80d7-aef7-ead5fa0bc999
A failure occurred SPDistributedCacheServiceInstance::UnprovisionInternal()... isGraceFulShutDown 'False' , isGraceFulShutDown, Exception 'System.InvalidOperationException: cacheHostInfo is null at Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheServiceInstance.UnProvisionInternal(Boolean
isGraceFulShutDown)'
8130ae9c-e52e-80d7-aef7-ead5fa0bc999
A failure occurred SPDistributedCacheServiceInstance::UnProvision() , Exception 'System.InvalidOperationException: cacheHostInfo is null at Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheServiceInstance.UnProvisionInternal(Boolean
isGraceFulShutDown) at Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheServiceInstance.Unprovision()'
8130ae9c-e52e-80d7-aef7-ead5fa0bc999
We are unable to perform any operation install/repair of SharePoint on the effected server (Server 1), as the server is no longer in the farm, we are unable to run any powershell commands.
Questions:-
What would cause that to happen?
Is there a way to resolve this issue? (please provide the steps)
Satyam

Hi
try this:
http://edsitonline.com/2014/03/27/unexpected-exception-in-feedcacheservice-isrepopulationneeded-unable-to-create-a-datacache-spdistributedcache-is-probably-down/
Hope this helps. Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.

Distributed Cache queries

Hi,
In a distributed cache scheme ( in multiple servers/jvm).
1. how to know which server is hosting what data (cache store) and the backup of this data is in which server?
2. Can this distribution be controlled? like a 'xyz' cache store is required to be in a specified '123' server only and that of the backup of 'xyz' cache store is required to be in '234' server?
Thanks,
~Ravi Shanker

Hi,
In a redundancy system only one server will be serving and the secondary will be idle. I just want to ensure that these idle systems are also used instead of lying idle.
Hence the question was raised on can we control the Distribution logic, where-in the least used data can be moved into these idle systems and re-direct the usage of data to these idle systems.In Coherence cluster, all the servers hold both primary and backup data, every is serving the requests and holding the backups as well so there are no idle systems.
but i have few things required for clarification.
While running the sample programs as per the documentation. We need to start a Default Cache Server and the java programs which act/add as cluster to the cache server.
But i have seen adding/acting of cluster is working even if the Default Cache Server is shut down?
Can u provide any info (links) or clarification how the Cache Server and Clusters mechanism work? Gone through the documentation but none has provided a clear picture of this?This is wrong assumption and every storage enabled node can become the cluster member. DefaultCacheServer is one of the implementations to run coherence server.
HTH
Cheers,
_NJ

Distributed Cache service stuck in Starting Provisioning

Hello,
I'm having problem with starting/stopping Distributed Cache service in one of the SharePoint 2013 farm servers. Initially, Distributed Cache was enabled in all the farm servers by default and it was running as a cluster. I wanted to remove it from all hosts
but one (APP server) using below PowerShell commands, which worked fine.
Stop-SPDistributedCacheServiceInstance -Graceful
Remove-SPDistributedCacheServiceInstance
But later I attempted to add the service back to two hosts (WFE servers) using below command and unfortunately one of them got stuck in the process. When I look at the Services on Server from Central Admin, the status says "Starting".
Add-SPDistributedCacheServiceInstance
Also, when I execute below script, the status says "Provisioning".
Get-SPServiceInstance | ? {($_.service.tostring()) -eq "SPDistributedCacheService Name=AppFabricCachingService"} | select Server, Status
I get "cacheHostInfo is null" error when I use "Stop-SPDistributedCacheServiceInstance -Graceful".
I tried below script,
$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName -and ($_.server.name) -eq $env:computername}
$serviceInstance.Unprovision()
$serviceInstance.Delete()
,but it didn't work either, and I got below error.
"SPDistributedCacheServiceInstance", could not be deleted because other objects depend on it. Update all of these dependants to point to null or
different objects and retry this operation. The dependant objects are as follows:
SPServiceInstanceJobDefinition Name=job-service-instance-{GUID}
Has anyone come across this issue? I would appreciate any help.
Thanks!

Hi ,
Are you able to ping the server that is already running Distributed Cache on this server? For example:
ping WFE01
As you are using more than one cache host in your server farm, you must configure the first cache host running the Distributed Cache service to allow Inbound ICMP (ICMPv4) traffic through the firewall.If an administrator removes the first cache host from
the cluster which was configured to allow Inbound ICMP (ICMPv4) traffic through the firewall, you must configure the first server of the new cluster to allow Inbound ICMP (ICMPv4) traffic through the firewall.
You can create a rule to allow the incoming port.
For more information, you can refer to the blog:
http://habaneroconsulting.com/insights/Distributed-Cache-Needs-Ping#.U4_nmPm1a3A
Thanks,
Eric
Forum Support
Please remember to mark the replies as answers
if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected]
Eric Tao
TechNet Community Support

Foundation 2013 Farm and Distributed Cache settings

We are on a 3 tier farm - 1 WFE + 1APP + 1SQL - have had many issues with AppFab and Dist Cache; and an additional issue with noderunner/Search Services. Memory and CPU running very high. Read that we shouldn't be running Search
and Dist Cache in the same server, nor using a WFE as a cache host. I don't have the budget to add another server in my environment.
I found an article (IderaWP_CachingFormSharePointPerformance.pdf) saying "To make use of SharePoint's caching capabilities requires a Server version of the platform." because it requires the publishing feature, which Foundation doesn't have.
So, I removed Distributed Cache (using Powershell) from my deployment and disabled the AppFab. This resolved 90% of server errors but performance didn't improve. Now, not only I'm getting errors now on Central Admin. - expects Dist Cache
- but I'm getting disk operations reading of 4000 ms.
Questions:
1) Should I enable AppFab and disable cache?
2) Does Foundation support Dist Cache? Do I need to run Distributed Cache?
3) If so, can I run with just 1 cache host? If I shouldn't run it on a WFE or an App server with Search, do I have to stop Search all together? What happens with 2 tier farms out there?
4) Reading through the labyrinth of links on TechNet and MSDN on the subject, most of them says "Applies to SharePoint Server".
5) Anyone out there on a Foundation 2013 production environment that could share your experience?
Thanks in advance for any help with this!
Monica
Monica

That article is referring to BlobCache, not Distributed Cache. BlobCache requires Publishing, hence Server, but DistributedCache is required on all SharePoint 2013 farms, regardless of edition.
I would leave your DistCache on the WFE, given the App Server likely runs Search. Make sure you install
AppFabric CU5 and make sure you make the changes as noted in the KB for
AppFabric CU3.
You'll need to separately investigate your disk performance issues. Could be poor disk layout, under spec'ed disks, and so on. A detail into the disks that support SharePoint would be valuable (type, kind, RPM if applicable, LUNs in place, etc.).
Trevor Seward
Follow or contact me at...
&nbsp&nbsp
This post is my own opinion and does not necessarily reflect the opinion or view of Microsoft, its employees, or other MVPs.

Write behind cache, DB down, when should the system stop taking new data in

Hello:
We are trying to use Coherence for our custom ESB, which is brokering payloads of various size between consumer and provider applications.
Before Coherence, stopping our DB meant organization-wide outage for critically important business services.
Since we have at least 40G of RAM in production environment, we believe that our app
can use Coherence write-behind option for tolerating at least several hours worth of DB outage.
We are currently using a near cache backed by distributed cache in write-behind mode.
9 business service JVMs (storage enabled=false) use 30 storage enabled JVMs.
IMPORTANT: We need to create an automated alerting facility determining when
amount of unsaved data reaches critical level since DB goes down. This alert should help us decide when our application stops accepting inbound traffic.
It is hard to use QueueSize parameter for that because our payload memory footprint can vary from 1KB to 3MB.
We do not expire any entries in order to enable support queries against the cache during DB outage.
Our experiments with trying various flavors of overflow-scheme resulted in OutOfMemoryError, therefore
we decided to implement RAM-only cache as a first step.
<near-scheme>
<scheme-name>message_payload_scheme</scheme-name>
<front-scheme>
<local-scheme>
<scheme-ref>limited_entities_front_scheme</scheme-ref>
<high-units>100</high-units>
</local-scheme>
</front-scheme>
<back-scheme>
<distributed-scheme>
<backing-map-scheme>
<read-write-backing-map-scheme>
<internal-cache-scheme>
<local-scheme>
<scheme-ref>limited_bytes_scheme</scheme-ref>
<high-units>199229440</high-units>
</local-scheme>
</internal-cache-scheme>
<cachestore-scheme>
<class-scheme>
<class-name>com.comp.MessagePayloadStore</class-name>
</class-scheme>
</cachestore-scheme>
<read-only>false</read-only>
<write-delay-seconds>3</write-delay-seconds>
<write-requeue-threshold>2147483646</write-requeue-threshold>
</read-write-backing-map-scheme>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
</back-scheme>
</near-scheme>
<local-scheme>
<scheme-name>limited_entities_front_scheme</scheme-name>
<eviction-policy>LRU</eviction-policy>
<unit-calculator>FIXED</unit-calculator>
</local-scheme>
<local-scheme>
<scheme-name>limited_bytes_scheme</scheme-name>
<eviction-policy>HYBRID</eviction-policy>
<unit-calculator>BINARY</unit-calculator>
</local-scheme>

Good info ... I feel like I need to restate my original question along with a couple of new questions caused by the discussion above.
Q1. Does Coherence evict 'dirty', or 'queued', or 'unsaved' objects for cache configuration provided above?
The answer should be 'NO', otherwise Coherence is unsafe to use as a system of record,
it should not just drop unsaved information on the floor.
Q2. What happens to the front tier of the near+partitioned write behind cache described above when amount of unsaved data exceeds max cache capacity defined via high-units?
I would expect that map.put starts throwing exceptions: cache storage is full, so it should not accept more data
Q3. How can I determine a moment when amount of dirty data in bytes(!), not in objects, hits 85% of
max allowed cache capasity configured in bytes (using high-units param and BINARY calculator).
'DirtyUnits' counter can probably be built with some lower-level Coherence API. Can we use
this API?
Please, understand, that we purchased Coherence for reliability, for making our
system independent from short DB outages, for keeping our business services up
and running when DBA need some time for admin operations like rebuilding an index.
Performance benefits are secondary and are not as obvious for our system which
uses primary keys only and has a well-tuned co-located Oracle back-end.
We simply cannot put Coherence to production unless we prove that Coherence
can reliably hold the data and give us information about approaching crisis
(the cache full of unsaved data).
If possible, forward this message to Cameron Purdy,
who was presenting Coherence to our team several moths ago.
Thanks,
Vasili Smaliak
Applications Architect, Enterprise App Integration
GMAC ResCap
[email protected]

Error handling for distributed cache synchronization

Hello,
Can somebody explain to me how the error handling works for the distributed cache synchronization ?
Say I have four nodes of a weblogic cluster and 4 different sessions on each one of those nodes.
On Node A an update happens on object B. This update is going to be propogated to all the other nodes B, C, D. But for some reason the connection between node A and node B is lost.
In the following xml
<cache-synchronization-manager>
<clustering-service>...</clustering-service>
<should-remove-connection-on-error>true</should-remove-connection-on-error>
If I set this to true does this mean that the Toplink will stop sending updates from node A to node B ? I presume all of this is transparent. In order to handle any errors I do not have to write any code to capture this kind of error .
Is that correct ?
Aswin.

This "should-remove-connection-on-error" option mainly applies to RMI or RMI_IIOP cache synchronization. If you use JMS for cache synchronization, then connectivity and error handling is provided by the JMS service.
For RMI, when this is set to true (which is the default) if a communication exception occurs in sending the cache synchronization to a server, that server will be removed and no longer synchronized with. The assumption is that the server has gone down, and when it comes back up it will rejoin the cluster and reconnect to this server and resume synchronization. Since it will have an empty cache when it starts back up, it will not have missed anything.
You do not have to perform any error handling, however if you wish to handle cache synchronization errors you can use a TopLink Session ExceptionHandler. Any cache synchronization errors will be sent to the session's exception handler and allow it to handle the error or be notified of the error. Any errors will also be logged to the TopLink session's log.

Newsfeed error - The operation failed because the server could not access the distributed cache.

Recently installed SharePoint 2013 RTM, on the newsfeed page an error is displayed, and no entries display in the following or everyone tabs.
"The operation failed because the server could not access the distributed cache."
Reading through various posts, I've checked:
- Activity feeds and mentions tabs are working as expected.
- User Profile Service is operational and syncing as expected
- Search is operational and indexing as expected
- The farm was installed based on the autospinstaller scripts.
- Don't believe this to be a permissions issue, during testing added accounts to the admin group to verify
Any suggestions are welcomed, thanks.
The full error message and trace logs is as follows.
SharePoint returned the following error: The operation failed because the server could not access the distributed cache. Internal type name: Microsoft.Office.Server.Microfeed.MicrofeedException. Internal error code: 55. Contact your system administrator
for help in resolving this problem.
From the trace logs there's several messages which are triggered around the same time:
http://msdn.microsoft.com/en-AU/library/System.ServiceModel.Diagnostics.TraceHandledException.aspxHandling an exception. Exception details: System.ServiceModel.FaultException`1[Microsoft.Office.Server.UserProfiles.FeedCacheFault]: Unexpected exception in
FeedCacheService.GetPublishedFeed: Object reference not set to an instance of an object.. (Fault Detail is equal to Microsoft.Office.Server.UserProfiles.FeedCacheFault)./LM/W3SVC/2/ROOT/d71732192b0d4afdad17084e8214321e-1-129962393079894191System.ServiceModel.FaultException`1[[Microsoft.Office.Server.UserProfiles.FeedCacheFault,
Microsoft.Office.Server.UserProfiles, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c]], System.ServiceModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089Unexpected exception in FeedCacheService.GetPublishedFeed: Object
reference not set to an instance of an object..
at Microsoft.Office.Server.UserProfiles.FeedCacheService.Microsoft.Office.Server.UserProfiles.IFeedCacheService.GetPublishedFeed(FeedCacheRetrievalEntity fcTargetEntity, FeedCacheRetrievalEntity fcViewingEntity, FeedCacheRetrievalOptions fcRetOptions)
at SyncInvokeGetPublishedFeed(Object , Object[] , Object[] )
at System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)
at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage31(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)System.ServiceModel.FaultException`1[Microsoft.Office.Server.UserProfiles.FeedCacheFault]: Unexpected exception in FeedCacheService.GetPublishedFeed: Object reference not
set to an instance of an object.. (Fault Detail is equal to Microsoft.Office.Server.UserProfiles.FeedCacheFault).
SPSocialFeedManager.GetFeed: Exception: Microsoft.Office.Server.Microfeed.MicrofeedException: ServerErrorFetchingConsolidatedFeed : ( Unexpected exception in FeedCacheService.GetPublishedFeed: Object reference not set to an instance of an object.. ) : Correlation
ID:db6ddc9b-8d2e-906e-db86-77e4c9fab08f : Date and Time : 31/10/2012 1:40:20 PM
at Microsoft.Office.Server.Microfeed.SPMicrofeedThreadCollection.PopulateConsolidated(SPMicrofeedRetrievalOptions retOptions, SPMicrofeedContext context)
at Microsoft.Office.Server.Microfeed.SPMicrofeedThreadCollection.Populate(SPMicrofeedRetrievalOptions retrievalOptions, SPMicrofeedContext context)
at Microsoft.Office.Server.Microfeed.SPMicrofeedManager.CommonGetFeedFor(SPMicrofeedRetrievalOptions retrievalOptions)
at Microsoft.Office.Server.Microfeed.SPMicrofeedManager.CommonPubFeedGetter(SPMicrofeedRetrievalOptions feedOptions, MicrofeedPublishedFeedType feedType, Boolean publicView)
at Microsoft.Office.Server.Microfeed.SPMicrofeedManager.GetPublishedFeed(String feedOwner, SPMicrofeedRetrievalOptions feedOptions, MicrofeedPublishedFeedType typeOfPubFeed)
at Microsoft.Office.Server.Social.SPSocialFeedManager.Microsoft.Office.Server.Social.ISocialFeedManagerProxy.ProxyGetFeed(SPSocialFeedType type, SPSocialFeedOptions options)
at Microsoft.Office.Server.Social.SPSocialFeedManager.<>c__DisplayClass4b`1.<S2SInvoke>b__4a()
Microsoft.Office.Server.Social.SPSocialFeedManager.GetFeed: Microsoft.Office.Server.Microfeed.MicrofeedException: ServerErrorFetchingConsolidatedFeed : ( Unexpected exception in FeedCacheService.GetPublishedFeed: Object reference not set to an instance of
an object.. ) : Correlation ID:db6ddc9b-8d2e-906e-db86-77e4c9fab08f : Date and Time : 31/10/2012 1:40:20 PM
at Microsoft.Office.Server.Microfeed.SPMicrofeedThreadCollection.PopulateConsolidated(SPMicrofeedRetrievalOptions retOptions, SPMicrofeedContext context)
at Microsoft.Office.Server.Microfeed.SPMicrofeedThreadCollection.Populate(SPMicrofeedRetrievalOptions retrievalOptions, SPMicrofeedContext context)
at Microsoft.Office.Server.Microfeed.SPMicrofeedManager.CommonGetFeedFor(SPMicrofeedRetrievalOptions retrievalOptions)
at Microsoft.Office.Server.Microfeed.SPMicrofeedManager.CommonPubFeedGetter(SPMicrofeedRetrievalOptions feedOptions, MicrofeedPublishedFeedType feedType, Boolean publicView)
at Microsoft.Office.Server.Microfeed.SPMicrofeedManager.GetPublishedFeed(String feedOwner, SPMicrofeedRetrievalOptions feedOptions, MicrofeedPublishedFeedType typeOfPubFeed)
at Microsoft.Office.Server.Social.SPSocialFeedManager.Microsoft.Office.Server.Social.ISocialFeedManagerProxy.ProxyGetFeed(SPSocialFeedType type, SPSocialFeedOptions options)
at Microsoft.Office.Server.Social.SPSocialFeedManager.<>c__DisplayClass4b`1.<S2SInvoke>b__4a()
at Microsoft.Office.Server.Social.SPSocialUtil.InvokeWithExceptionTranslation[T](ISocialOperationManager target, String name, Func`1 func)
Microsoft.Office.Server.Social.SPSocialFeedManager.GetFeed: Microsoft.Office.Server.Social.SPSocialException: The operation failed because the server could not access the distributed cache. Internal type name: Microsoft.Office.Server.Microfeed.MicrofeedException.
Internal error code: 55.
at Microsoft.Office.Server.Social.SPSocialUtil.TryTranslateExceptionAndThrow(Exception exception)
at Microsoft.Office.Server.Social.SPSocialUtil.InvokeWithExceptionTranslation[T](ISocialOperationManager target, String name, Func`1 func)
at Microsoft.Office.Server.Social.SPSocialFeedManager.<>c__DisplayClass48`1.<S2SInvoke>b__47()
at Microsoft.Office.Server.Social.SPSocialUtil.InvokeWithExceptionTranslation[T](ISocialOperationManager target, String name, Func`1 func)

Thanks Thuan,
I've restarted to the Distrubiton Cache servicem and the error is still occuring.
The AppFabric Caching Service is running under the service apps account, and does appear operational based on:
> use-cachecluster
> get-cache
CacheName            [Host]
                     Regions
default
DistributedAccessCache_1e9f4999-0187-40e8-aa92-f8308d47d6e9
DistributedActivityFeedCache_1e9f4999-0187-40e8-aa92-f8308d47d6e9
DistributedActivityF [SERVER:22233]
eedLMTCache_1e9f4999 LMT(Primary)
-0187-40e8-aa92-f830
8d47d6e9
DistributedBouncerCache_1e9f4999-0187-40e8-aa92-f8308d47d6e9
DistributedDefaultCache_1e9f4999-0187-40e8-aa92-f8308d47d6e9
DistributedLogonToke [SERVER:22233]
nCache_1e9f4999-0187 Default_Region_0538(Primary)
-40e8-aa92-f8308d47d Default_Region_0004(Primary)
6e9                  Default_Region_0451(Primary)
DistributedSearchCache_1e9f4999-0187-40e8-aa92-f8308d47d6e9
DistributedSecurityTrimmingCache_1e9f4999-0187-40e8-aa92-f8308d47d6e9
DistributedServerToAppServerAccessTokenCache_1e9f4999-0187-40e8-aa92-f8308d47d6e9

Different distributed caches within the cluster

Hi,
i've three machines n1 , n2 and n3 respectively that host tangosol. 2 of them act as the primary distributed cache and the third one acts as the secondary cache. i also have weblogic running on n1 and based on some requests pumps data on to the distributed cache on n1 and n2. i've a listener configured on n1 and n2 and on the entry deleted event i would like to populate tangosol distributed service running on n3. all the 3 nodes are within the same cluster.
i would like to ensure that the data directly coming from weblogic should only be distributed across n1 and n2 and NOT n3. for e.g. i do not start an instance of tangosol on node n3. and an object gets pruned from either n1 or n2. so ideally i should get a storage not configured exception which does not happen.
The point is the moment is say CacheFactory.getCache("Dist:n3") in the cache listener, tangosol does populate the secondary cache by creating an instance of Dist:n3 on either n1 or n2 depending from where the object has been pruned.
from my understanding i dont think we can have a config file on n1 and n2 that does not have a scheme for n3. i tried doing that and got an illegalstate exception.
my next step was to define the Dist:n3 scheme on n1 and n2 with local storage false and have a similar config file on n3 with local-storage for Dist:n3 as true and local storage for the primary cache as false.
can i configure local-storage specific to a cache rather than to a node.
i also have an EJB deployed on weblogic that also entertains a getData request. i.e. this ejb will also check the primary cache and the secondary cache for data. i would have the statement
NamedCahe n3 = CacheFactory.getCache("n3") in the bean as well.

Hi Jigar,
i've three machines n1 , n2 and n3 respectively that
host tangosol. 2 of them act as the primary
distributed cache and the third one acts as the
secondary cache.First, I am curious as to the requirements that drive this configuration setup.
i would like to ensure that the data directly coming
from weblogic should only be distributed across n1
and n2 and NOT n3. for e.g. i do not start an
instance of tangosol on node n3. and an object gets
pruned from either n1 or n2. so ideally i should get
a storage not configured exception which does not
happen.
The point is the moment is say
CacheFactory.getCache("Dist:n3") in the cache
listener, tangosol does populate the secondary cache
by creating an instance of Dist:n3 on either n1 or n2
depending from where the object has been pruned.
from my understanding i dont think we can have a
config file on n1 and n2 that does not have a scheme
for n3. i tried doing that and got an illegalstate
exception.
my next step was to define the Dist:n3 scheme on n1
and n2 with local storage false and have a similar
config file on n3 with local-storage for Dist:n3 as
true and local storage for the primary cache as
false.
can i configure local-storage specific to a cache
rather than to a node.
i also have an EJB deployed on weblogic that also
entertains a getData request. i.e. this ejb will also
check the primary cache and the secondary cache for
data. i would have the statement
NamedCahe n3 = CacheFactory.getCache("n3") in the
bean as well.In this scenario, I would recommend having the "primary" and "secondary" caches on different cache services (i.e. distributed-scheme/service-name). Then you can configure local storage on a service by service basis (i.e. distributed-scheme/local-storage).
Later,
Rob Misek
Tangosol, Inc.

Setup failover for a distributed cache

Hello,
For our production setup we will have 4 app servers one clone per each app server. so there will be 4 clones to a cluster. And we will have 2 jvms for our distributed cache - one being a failover, both of those will be in cluster.
How would i configure the failover for the distributed cache?
Thanks

user644269 wrote:
Right - so each of the near cache schemes defined would need to have the back map high-units set to where it could take on 100% of data.Specifically the near-scheme/back-scheme/distributed-scheme/backing-map-scheme/local-scheme/high-units value (take a look at the [Cache Configuration Elements|http://coherence.oracle.com/display/COH34UG/Cache+Configuration+Elements] ).
There are two options:
1) No Expiry -- In this case you would have to size the storage enabled JVMs to that an individual JVM could store all of the data.
or
2) Expiry -- In this case you would set the high-units a value that you determine. If you want it to store all the data then it needs to be set higher than the total number of objects that you will store in the cache at any given time or you can set it lower with the understanding that once that high-units is reached Coherence will evict some data from the cluster (i.e. remove it from the "cluster memory").
user644269 wrote:
Other than that - there is not configuration needed to ensure that these JVM's act as a failover in the event one goes down.Correct, data fault tolerance is on by default (set to one level of redundancy).
:Rob:
Coherence Team

Distributed cache during solution deployment

Hi,
We are using MySite newsfeed.
What is the best practice during deployment of solution the distributed cache is not affected.
Last time when we did IIS reset the feed was lost and we have to use repopulated job to pull the data.Is there any beetr way to follow during deployment and server upgrades.
Thanks,
Sudan

Hi Sudan,
The Distributed Cache service stores data in-memory only, so executing iisreset might cause cache flush. Please refer to the thread below to move all cached items from local cache to other cache host in the cluster:
http://social.technet.microsoft.com/Forums/sharepoint/en-US/6a415c75-4ca3-4c43-9110-25a68db93a54/sharepoint-2013-my-site-newsfeed-posts-disappear?forum=sharepointgeneral
Regards,
Rebecca Tu
TechNet Community Support

Node/Machine fail behavior of distributed caches

My high level question is: what happens to a distributed cache when nodes fail?
We have 2 servers which run 4 JVMs each. We have the default of 1 backup set.
What happens when an entire machine fails (all 4 JVMs go down with the ship)?
What happens when I stop and restart each JVM one at a time?
My main concern is data-loss. Since I have backup set to 1 my expectations for both of my scenarios above is that I would lose no cached data, but that does not appear to be the case. I am left wondering in what scenarios the backups help.
How does the cluster tell the difference between (a) a node failed but will be restored soon enough so don't reduce the cluster size, and (b) a node was removed and will never come back so reduce the cluster size?
It would be nice to see a wiki page that describes the gory details of how the cluster handles various failure scenarios.

Each partition is allocated to a JVM and a backup of that partition is allocated to another JVM. If you are running on multiple physical machines then Coherence will put the backup partition on another machine to the primary. You can tell how successful Coherence has been at doing this by looking at the StatusHA value for your services in JMX using something like JConsole. If the backup partitions are on different machines to the primary partitions the StatusHA value will say MACHINE-SAFE, if the backup is on the same machine as the primary the StatusHA value will be NODE-SAFE and if there is no backup the StatusHA value will be ENDANGERED.
There is also a status called BALANCED, which means that besides being MACHINE-SAFE, the partitions are also as evenly distributed between nodes (not boxes) as possible.
When you loose a JVM (or multiple JVMs if you loose a whole machine) this cause a partition loss event for the partitions that were allocated to the dead JVMs. In the case of loosing a single JVM the backup partition now becomes the primary and a new backup ios created (following the same rules about creating the backup on another machine if possible). If you loose a whole machine then the same thing happens but on a bigegr scale.
A small correction: partition loss event happens is when you lose both the primary and all backups. What you described is not a partition loss, as a backup is there and is promoted to primary.
Also, losing a whole machine is the same only in the case when you were machine-safe (or at least those partitions which had primaries on the lost box were machine-safe). If those partitions were not machine safe, then you would have lost partitions as all copies to non-machine-safe partitions on that box were lost.
Other than that it does happen as described.
In your case you should not necesarrily see data loss if you kill a single node from the cluster you described and neither should you loose data if you kill a whole machine. As mentioned, provided that the cluster or at least the partitions having primaries on the killed box are machine-safe.
There are scenarios where data loss can occur, for example loosing two JVM on different machines at exactly the same time - this is becuse there is a very high chance that those two JVMs shared primary and backups for at least one partition.
If you loose a JVM the cluster size will always be reduced - it cannot be anything else as a node has just departd the cluster.
The above descriptions may be a bit simplified but I think they are close enough to describe what you wanted to know.
JKBest regards,
Robert

How can i configure Distributed cache servers and front-end servers for Streamlined topology in share point 2013??

my question is regarding SharePoint 2013 Farm topology. if i want go with Streamlined topology and having (2 distribute cache and Rm servers+ 2 front-end servers+ 2 batch-processing servers+ cluster sql server) then how distributed servers will
be connecting to front end servers? Can i use windows 2012 NLB feature? if i use NLB and then do i need to install NLB to all distributed servers and front-end servers and split-out services? What will be the configuration regarding my scenario.
Thanks in Advanced!

For the Distributed Cache servers, you simply make them farm members (like any other SharePoint servers) and turn on the Distributed Cache service (while making sure it is disabled on all other farm members). Then, validate no other services (except for
the Foundation Web service due to ease of solution management) is enabled on the DC servers and no end user requests or crawl requests are being routed to the DC servers. You do not need/use NLB for DC.
Trevor Seward
Follow or contact me at...
&nbsp&nbsp
This post is my own opinion and does not necessarily reflect the opinion or view of Microsoft, its employees, or other MVPs.

Limitation on number of objects in distributed cache

Hi,
Is there a limitation on the number (or total size) of objects in a distributed cache? I am seeing a big increase in response time when the number of objects exceeds 16,000. Normally, the ServiceMBean.RequestAverageDuration value is in the 6-8ms range as long as the number of objects in the cache is less than 16K - I've run our application for weeks at a time without seeing any problems. However, once the number of objects exceeds the magic number of 16K the average request duration almost immediately jumps to over 100ms and continues to climb as more objects are added.
I'm fairly confident that the cache is indexed properly (as Dimitri helped us with that). Are there any configuration changes that could possibly help out here? We are using Coherence 3.3.
Any suggestions would be greatly appreciated.
Thanks,
Jim

Hi Jim,
The results from the load test look quite normal, the system fairly quickly stabilizes at a particular performance level and remains there for the duration of the test. In terms of latency results, we see that the cache.putAll operations are taking ~45ms per bulk operation where each operation is putting 100 1K items, for cache.getAll operations we see about ~15ms per bulk operation. Additionally note that the test runs over 256,000 items, so it is well beyond the 16,000 limit you've encountered.
So it looks like your application are exhibiting different behavior then this test. You may wish to try to configure this test to behave as similarly to yours as possible. For instance you can set the size of the cache to just over/under 16,000 using the -entries parameter, set the size of the entries to 900 bytes using the -size parameter, and set the total number of threads per worker using the -threads parameter.
What is quite interesting is that at 256,000 1K objects the latency measured with this test is apparently less then half the latency you are seeing with a much smaller cache size. This would seem to point at the issue being related to or rooted in your test. Would you be able to provide a more detailed description of how you are using the cache, and the types of operations you are performing.
thanks,
mark

Distributed Cache Host Down

Similar Messages

Maybe you are looking for