The troubleshooting guide below will help you to successfully set up and use the StackState AWS integration.
- No AWS data displayed in StackState:
- Not all components are displayed in the topology
- Configuration or tags are not shown for some components
- Metrics not displayed on resources
- Changes are not displayed in real-time
- Deleted AWS resources are not deleted from StackState in real-time
- Newly added resources/features are not displayed after the Agent has been updated
- Errors:
No AWS data displayed in StackState
After installing the StackState AWS StackPack, the AWS synchronization remains in the Waiting for Data state and no data is coming in from AWS.
Error: AssumeRole AccessDenied
Check that the Agent has the correct permissions to use the role created in the target AWS account. In the Agent logs, an error An error occurred (AccessDenied)
when calling the AssumeRole operation` indicates that the Agent does not have
the necessary permission to access the role
. Here are some steps to check:
- If running the Agent outside of AWS, check that the access key and secret access keys are specified correctly in the configuration.
- If running the Agent inside of AWS, check that the role given to the EC2 instance or ECS task has the correct trust policy, allowing the respective services to use the role.
- Check that the role created inside of the AWS account to be scanned allows the correct AWS account to assume it.
- Check that the IAM user or role used by the Agent is allowed to assume the StackStateAwsIntegrationRole in the target account.
- Check that the external ID set on the StackStateAwsIntegrationRole is set correctly. The Agent will fail if the external ID does not match the Agent configuration, or if an external ID is not set.
- Check that no Permissions Boundaries or Service Control Policies are preventing assuming the role.
No error
If no error is displayed in the Agent and it appears that snapshots are being created, check the StackState side to ensure that data is being received correctly.
The AWS synchronization logs are accessible here: http://localhost:7070/api/logs/sync/sync.AWS?limit=100
. Replace localhost
with the name of the StackState instance. If the AWS synchronization has a non-standard name, you can find all logs here: http://localhost:7070/api/logs
If the log files don't yield any clues, we will need to dig a bit deeper into the internals of StackState. StackState collects data from external sources in a topic on the Kafka bus. Using the URL http://localhost:7070/api/topic
we can see all Kafka topics. There should be a topic with the prefix sts_topo_aws_v2_.
Using the StackState CLI, we can also inspect messages on the topic to verify that the instance is receiving data. See our documentation for the instructions.
Not all components are displayed in the topology
If you are in the process of migrating from the AWS (Legacy) StackPack to the new AWS StackPack, ensure that only one instance exists per AWS account at any time. If an AWS account is configured in both the legacy and new StackPacks at the same time, components will not be displayed properly.
It is likely that the Agent was not able to access these resources. If the Agent was able to successfully complete a snapshot but some services or resources were not successfully read, these will be emitted as warnings. If this is the case, there is likely a configuration issue with the AWS environment causing this issue:
- Check that the Agent has permission to read the resource. If the Agent gets an AccessDenied error, it will suggest the IAM permission that may be missing. There are several reasons why a permission is not available:
- The IAM policy used by the Agent has been modified and the permission removed. If this IAM role was set up using the CloudFormation template, this can be checked by detecting stack drift (docs.aws.amazon.com).
- The role has a Permission Boundary (docs.aws.amazon.com) that overrides the permissions that it may grant.
- The AWS account has a Service Control Policy (docs.aws.amazon.com) that limits the permissions that any principal in the account can grant. If this is the case, check with your AWS administrator.
- There is an issue with an AWS service. If the Agent is not able to access an AWS service, or the data returned by an API call is not in the format the Agent expects, the entire service will not be displayed. First check your Personal Health Dashboard, and if no issues are displayed, submit a bug report for the AWS StackPack.
Configuration or tags are not shown for some components
First, check that the component supports showing additional configuration or tags. If you have seen this data in the past, but it is not showing now, it is likely a permission issue. The Agent will attempt to show all data it can gather, but if a specific permission is missing, then it will fail to display the data. Every AWS service handles permissions slightly differently, but in general they fall under 3 categories:
- List: Give a list of all the identifiers of resources that live in an account. If this permission is not granted, no resources will be displayed at all.
- Describe: Give the configuration of the item. If this permission is not granted, the component will be displayed but the configuration will be empty.
- Tags: Give a list of all tags applied to the item. Not every resource supports this. If this permission is not granted, no tags will be shown in the configuration.
Check the Agent logs to see if there are any warnings concerning permissions issues. If so, follow the troubleshooting steps shown in Not all components are displayed in the topology.
Metrics not displayed on resources
The CloudWatch plugin used to retrieve metrics from AWS is separate from the Agent, and is configured on the StackPacks page inside of StackState. Check that the role ARN, external ID and access keys are identical to the ones used by the Agent. Metrics are then directly gathered by StackState on-demand, instead of periodically updating like the Agent.
This can also be verified in Synchronizations and Data sources in the StackState Settings:
- In Settings > Topology Synchronization > Synchronization there should be a synchronization named AWS.
- In Settings > Topology Synchronization > Sts Sources, there should be an AWS datasource. If it is there, select Edit from the triple-dot menu and test the connection using the Test Connection button.
- In Settings > Telemetry Sources > CloudWatch Sources, there should be a CloudWatchSource datasource. If it is there, select Edit from the triple-dot menu and test the connection using the Test Connection button.
Changes are not displayed in real-time
For all real-time data, you should expect a delay of up to 2 minutes before changes are displayed. For efficiency, events are batched in groups of 1 minute, and the Agent checks for new events every minute.
If real-time updates are showing but taking a significant time to show (15+ minutes) check the Agent logs to ensure that it can access the S3 bucket correctly. If the Agent can't reach the S3 bucket, it will fall back to the much slower method of reading CloudTrail logs directly, which have a 15 minute delay. VPC FlowLogs and EventBridge events will not be available at all.
If no real-time updates are appearing at all, but hourly full snapshots are still working:
- Check the S3 bucket to ensure that log files for EventBridge are being delivered once per minute. If this is not the case, the Kinesis Firehose Data Stream may be configured incorrectly or has issues. If using the CloudFormation template. To see if the resources were manually modified, check for stack drift (docs.aws.amazon.com).
- Check the Agent logs to ensure that it has the correct permissions to access S3 or CloudTrail. If it has access to neither, then no real-time data will be displayed. As failure or real-time logs does not impact the running of the Agent, these will be emitted as warnings.
Deleted AWS resources are not deleted from StackState in real-time
StackState currently does not support deleting items when using partial snapshots, which are used to generate updates to the topology in real-time. The component will be removed from the topology on the next hourly full snapshot. This functionality may be added in the future.
Newly added resources/features are not displayed after the Agent has been updated
The CloudFormation template in each AWS account must also be updated, as the template will include new IAM permissions to access the new resources.
Errors
Error: "stackstate-logs-<accountid> already exists" when creating a CloudFormation stack
This error is caused while attempting to create the S3 bucket. S3 bucket names must be globally unique, for all users of AWS. The CloudFormation template creates a bucket suffixed with the unique AWS account ID to ensure that the bucket name is unique, but there can still be a small chance that the bucket name is already in use.
- Check that the bucket has not already been created in the target AWS account. If so, check that the CloudFormation stack has already been deployed.
- If the bucket does not exist in the account, then the bucket name has already been taken by another account.
- Open the CloudFormation template in a text editor and locate the section named StsLogsBucket.
- Under this item, find the line named "BucketName". Change the value in this line to something unique, and deploy the modified template.
- In the Agent, modify the conf.yaml file for the AWS check. Find the section under instance_info that corresponds to the target AWS account.
- Add a new line:
log_bucket_name: <name>
, substitutingname
for the unique name of the bucket.
Note that steps 2-4 must be repeated for any new CloudFormation template update.
Error: "bucket is not empty" when deleting a CloudFormation stack
S3 buckets can't be deleted if they have objects inside of them. The uninstallation guide for the AWS StackPack provides steps for how to delete all items in the bucket before removing the CloudFormation stack.
- The S3 bucket is versioned, so deleting objects in the normal way will not delete them. Instead it will add a delete marker to the objects. The steps shown in the documentation show how to delete object versions, which will entirely delete the object.
- A new object may have been added to the bucket in between the emptying of the bucket, and the deletion of the bucket. Ensure that all event sources have been disabled beforehand, to prevent them from adding new objects.
Once the bucket has been properly emptied, try running the CloudFormation delete again. Alternatively, the dialog box will give an option to retain the resource, and the bucket can then be manually deleted after.
Error: "iconbase64: Must be a valid icon" when installing an AWS StackPack instance
The AWS StackPack is supported on StackState versions 4.4 and above. Attempting to install the StackPack on earlier versions will fail with this error. If upgrading StackState is not an option, use the AWS (Legacy) StackPack.
Need help?
If the above doesn't resolve the issue, contact our support team for help. Please provide them with the logs mentioned in the section data not received.
Comments
0 comments
Article is closed for comments.