Troubleshooting Error Messages

Error messages provide information about problems that might occur when setting up the SnappyData cluster or when running queries.
You can use the following information to resolve such problems.

The following topics are covered in this section:

Error Message: Region {0} has potentially stale data. It is waiting for another member to recover the latest data. My persistent id:
{1}
Members with potentially new data:
{2}Use the "{3} list-missing-disk-stores" command to see all disk stores that are being waited on by other members.

Diagnosis:
The above message is typically displayed during start up when a member waits for other members in the cluster to be available, as the table data on disk is not the most current.
The status of the member is displayed as waiting in such cases when you check the status of the cluster using the snappy-status-all.sh command.

Solution:
The status of the waiting members change to online once all the members are online and the status of the waiting members is updated. Users can check whether the status is changed from waiting to online by using the snappy-status-all.sh command or by checking the SnappyData Pulse UI.

Error Message: XCL54.T Query/DML/DDL '{0}' canceled due to low memory on member '{1}'. Try reducing the search space by adding more filter conditions to the where clause. query

Diagnosis:
This error message is reported when a system runs on low available memory. In such cases, the queries may get aborted and an error is reported to prevent the server from crashing due to low available memory.
Once the heap memory usage falls below critical-heap-percentage the queries run successfully.

Solution:
To avoid such issues, review your memory configuration and make sure that you have allocated enough heap memory.
You can also configure tables for eviction so that table rows are evicted from memory and overflow to disk when the system crosses eviction threshold. For more details refer to best practices for memory management

Message: {0} seconds have elapsed while waiting for reply from {1} on {2} whose current membership list is: [{3}]

Diagnosis:
The above warning message is displayed when a member is awaiting for a response from another member on the system and response has not been received for some time.

Solution:
This generally means that there is a resource issue in (most likely) the member that is in waiting status. Check whether there is a garbage collection activity going on in the member being waited for. Due of large GC pauses, the member may not be responding in the stipulated time. In such cases, review your memory configuration and consider whether you can configure to use off-heap memory.

Error Message: Region {0} bucket {1} has persistent data that is no longer online stored at these locations: {2}

Diagnosis:
In partitioned tables that are persisted to disk, if you have any of the members offline, the partitioned table is still available, but, may have some buckets represented only in offline disk stores. In this case, methods that access the bucket entries report a PartitionOfflineException error.

Solution:
If possible, bring the missing member online. This restores the buckets to memory and you can work with them again.

Error Message: ForcedDisconnectException Error: "No Data Store found in the distributed system for: {0}"

Diagnosis:
A distributed system member’s Cache and DistributedSystem are forcibly closed by the system membership coordinator if it becomes sick or too slow to respond to heartbeat requests. The log file for the member displays a ForcedDisconnectException with the message.
One possible reason for this could be that large GC pauses are causing the member to be unresponsive when the GC is in progress.

Solution:
To minimize the chances of this happening, you can increase the DistributedSystem property member-timeout. This setting also controls the length of time required to notice a network failure. Also, review your memory configuration and configure to use off-heap memory.

Error Message: Node went down or data no longer available while iterating the results.

Diagnosis:
In cases where a node fails while a JDBC/ODBC client or job is consuming result of a query, then it can result in the query failing with such an exception.

Solution:
This is expected behaviour where the product does not retry, since partial results are already consumed by the application. Application must retry the entire query after discarding any changes due to partial results that are consumed.