Untitled

Open sidebar
Kafka Stability and Reconciliation

Model: Default (GPT-3.5)

User
Elastic Reconciliation Report for Failed Events - 

Server - ST2(microservice was running in 45)

Targetted Events - {"addIndividual","addOrganization","addPartyRole","addSubscriber","createBatchOrder","createCustomerOrder","enrichAddon","enrichAttribute","enrichPayer","enrichUser","orderLineCease","terminatePartner","updateCustomerOrder","updatePartyRole","updatePartyRoleByAssocId","updateSalesOrder","updateSubscriber","orderCompletion","updateAccount","updateCustomer","updateOrderLines","createOrder","addContact","addCustomer","addAccount","updateOrder"}

Daily Average Failure Count = 200-400

Observations - Some times the failure count is around 3000-5000 due to service failures like Kafka down or else EventEnricher Failure

Reason for Events Failure in Microservice - While kafka trying to send the message with the key and payload provided Specifically, it says "Expiring 16 record(s) for salesorder.reconstruct.request-0:120000 ms has passed since batch creation." This indicates that the Kafka producer attempted to send a batch of 16 records related to the salesorder.reconstruct.request topic but was not able to complete it within the allowed time (120000 ms).



Error - 2023-07-27 09:12:01.790 ERROR 19352 --- [ad | producer-1] c.t.telecom.tib.producer.EventPublisher  : Message transmission failed for eventID:a47128cb-ab78-410d-be3c-8ad10fb9dcfd, eventType:updateSalesOrder, eventDate:2023-07-27T09:11:00+0530
2023-07-27 09:12:01.790 ERROR 19352 --- [ad | producer-1] c.t.telecom.tib.producer.EventPublisher  : Message transmission failed because of:Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: Expiring 16 record(s) for salesorder.reconstruct.request-0:120000 ms has passed since batch creation
2023-07-27 09:12:01.790 DEBUG 19352 --- [ad | producer-1] c.t.telecom.tib.producer.EventPublisher  : event with eventstatus:{"event":{"eventType":"updateSalesOrder"},"request":{"buID":"DEFAULT","createdDate":"27-Jul-2023 09:11:00","orderId":"SALES_ORDER_10072023_01","opID":"HOB","userID":"SALES_ORDER_BATCH","subStatus":"COMPLETED","status":"COMPLETED"},"eventID":"a47128cb-ab78-410d-be3c-8ad10fb9dcfd","eventDate":"2023-07-27T09:11:00+0530","eventStatus":{"topic-publish":"FAILED"}}
2023-07-27 09:12:01.790 ERROR 19352 --- [ad | producer-1] o.s.k.support.LoggingProducerListener    : Exception thrown when sending a message with key='null' and payload='{"event":{"eventType":"updateSalesOrder"},"request":{"buID":"DEFAULT","createdDate":"27-Jul-2023 09:...' to topic salesorder.reconstruct.request:

Observations - Kafka getting down frequently in 45 server


Reconciliation Scripts - currently two scripts running in st2

Script-1(captureElasticFailedEvents)
  - captures the targetted events on reconCriteria = 'tibsearchindex': 'SUCCESS'

Script-2(publishElasticFailedEvents)
  - publishes the targetted events on reconCriteria = 'tibsearchindex': 'SUCCESS'
  - The threshold count for retries is setted as 5
  - kafka batch size = 25

Observations - max at reties of 2 all the failed events are getting published  


Can you please arrange or rewrite above report because I need to send mail to my manager 

ChatGPT
Subject: Elastic Reconciliation Report for Failed Events - Action Required

Dear [Manager's Name],

I hope this email finds you well. I am writing to provide an update on the recent issues we've been facing with the ST2 microservice and its reconciliation process for failed events. Below is a summary of the observations and actions taken so far:

Server Details:

Microservice: ST2
Server: 45
Targeted Events:
The following events have been targeted for reconciliation:
{"addIndividual", "addOrganization", "addPartyRole", "addSubscriber", "createBatchOrder", "createCustomerOrder", "enrichAddon", "enrichAttribute", "enrichPayer", "enrichUser", "orderLineCease", "terminatePartner", "updateCustomerOrder", "updatePartyRole", "updatePartyRoleByAssocId", "updateSalesOrder", "updateSubscriber", "orderCompletion", "updateAccount", "updateCustomer", "updateOrderLines", "createOrder", "addContact", "addCustomer", "addAccount", "updateOrder"}

Daily Average Failure Count:
On average, we are experiencing 200-400 failed events daily.

Reasons for Event Failures:
Most of the event failures are due to issues with Kafka, which is responsible for sending messages with key and payload. Specifically, we have observed the error message: "Expiring 16 record(s) for salesorder.reconstruct.request-0: 120000 ms has passed since batch creation." This indicates that the Kafka producer attempted to send a batch of 16 records related to the salesorder.reconstruct.request topic but was unable to complete it within the allowed time (120000 ms).

Recent Error Log (Example):

Event ID: a47128cb-ab78-410d-be3c-8ad10fb9dcfd
Event Type: updateSalesOrder
Event Date: 2023-07-27T09:11:00+0530
Error Message: "Message transmission failed because of: Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: Expiring 16 record(s) for salesorder.reconstruct.request-0:120000 ms has passed since batch creation."
Observations on Kafka Instability:

Kafka service has been observed to go down frequently on the Server 45, leading to event processing interruptions.
Current Reconciliation Scripts:

Script-1 (captureElasticFailedEvents):

Captures the targeted events with reconCriteria = 'tibsearchindex': 'SUCCESS'.
Script-2 (publishElasticFailedEvents):

Publishes the captured events with reconCriteria = 'tibsearchindex': 'SUCCESS'.
Threshold count for retries is set to 5.
Kafka batch size is set to 25.
Observations on Reconciliation:

We have noticed that most of the failed events are successfully published after 2 retries, but we still have room for improvement, especially in cases of high failure counts.
Action Required:

To address the challenges faced with the reconciliation process, I recommend the following actions:

Kafka Stability Improvement:

Investigate the root cause of Kafka instability on Server 45 and take necessary measures to enhance its reliability. This could involve scaling resources, optimizing configurations, or applying any relevant updates.
Threshold and Batch Size Optimization:

Consider tuning the threshold count for retries in Script-2 based on the analysis of event failure patterns. Adjusting the batch size for Kafka messages may also have a positive impact on event processing efficiency.
Monitoring and Alerting:

Implement proactive monitoring and alerting mechanisms for the Kafka service. This will help us identify potential downtimes or performance issues in real-time and take prompt actions to mitigate them.
Failover Mechanism:

Evaluate the feasibility of implementing a failover mechanism that allows the microservice to switch to an alternate Kafka instance if the current one becomes unavailable. This can improve the overall resilience of the system.
Please let me know if you require any further information or if you would like to discuss the above points in detail. Your guidance and support in addressing these challenges will be crucial in ensuring the smooth functioning of our microservice.

Thank you for your attention to this matter. Looking forward to your feedback and guidance.

Best regards,
[Your Name]
[Your Designation]
[Your Contact Information]





Send a message


Free Research Preview. ChatGPT may produce inaccurate information about people, places, or facts. ChatGPT July 20 Version
Editor is loading...