Note: This post was written for Splunk 4.2 on Linux. Some concepts may be similar for other versions but you should consult the Splunk docs.
I've been hands-on working with a Splunk implementation again and found myself looking for good documentation on their Deployment Server functionality to help train others. The docs on Splunk.com are quite good but require serious experience with Splunk in order to understand the settings and configurations available for centralizing the automated deployment functionality. This post contains information for those starting out with the Deployment Server.
1. Planning Deployments
Splunk Deployment Server functionality is designed to automate the deployment of policies, configurations, settings, and apps to remote Splunk components within your environment utilizing a basic client-server model. But depending on your needs, this basic model can be extended in a variety of ways to meet your needs, some of which are discussed here.
Basic Deployment Pattern
The basic model for implementing a Deployment Server within most environments requires a simple set of classes for each type of Splunk component: search heads, indexers, and forwards (heavy, light, or universal). This diagram depicts the basic functionality for how configurations and apps can be pushed out from the Deployment Server to each class of system within the environment.
Subclass Deployment Pattern
For those that require an additional layer of flexibility within their design of configurations and applications to accommodate separation of logging facilities or different retention policies for specific data, this design pattern utilizes subclasses to divide out the functionality of the indexers into idx-central and idx-critical.
Subclasses provide a basic construct for dividing out configurations and/or apps based on your needs and requirements. Here you can determine what is different about each class and develop different deployment packages for each subclass. In this case, retention and index storage may be different for each subclass which requires the configuration files to be defined specific to each.
Distributed Deployment Pattern
For some organizations, they may wish to have separate deployment servers in each data center or remote office to reduce failure domains and to enhance the security of each environment by alleviating the need to distribute configuration files among domains. In this pattern, a master deployment server can be created to serve as both a deployment server for specific components and as a master for other slave deployment servers.
This is a complex pattern and requires a significant amount of testing to ensure that you get the configurations correct but can be very powerful for multi-data center environments.
2. Identify Infrastructure
Depending on the pattern you chose during the previous step you will need a number of servers to support your Splunk implementation. For the Deployment Server, I'd pick a server that is not part of your Splunk infrastructure to install and configure. Why?
- Systems that run other Splunk components will become bogged down by deployment functionality within larger environments (more than 100 Splunk servers). Splunk deployment servers should be distributed when the system begins to support approximately 300 nodes.
- When you add the Splunk Deployment Server functionality to an existing Splunk server, this system will be configured different from other servers with similar functionality enabled. The Deployment Server utilizes separate folders and configurations from other component infrastructure to establish the deployment functionality, ie. deployment-apps, apps, and serverclass.conf. This means that you will either have to include the same server in the Deployment Server configurations for the component functionality already present or configure it separately as an exception. Both can be tricky.
- When troubleshooting the Deployment Server functionality and component functionality, it makes it easier to separate issues when the Deployment Server exists independent of other functionality.
- In large environments, distributing deployment functionality ensures stability, facilitates automation and policy integrations, and strengthens the security model.
If you don't have a separate system for the Deployment Server initially, don't fret. You just have to think about the system as though it provides separate Deployment Server functionality when configuring and troubleshooting. But plan to replace it as your Splunk infrastructure grows.
3. Install Splunk
You will want to install the most recent version of Splunk on each system. The Splunk Deployment Server should always be kept up to date since it is backwards compatible with agents but agents are not backwards compatible with old deployment server functionality. To do this, it is also a good practice to create an install script which can be leveraged to automatically update the servers where configuration files are still considered compatible.
4. Configure the Deployment Server
It's simple, follow these steps:
- Choose a name for the deployment package you are going to create.
- You will add your deployment packages to /opt/splunk/etc/deployment-apps/ on the Deployment Server.
- Within the deployment package, you will want to create the configuration files or place apps that you will be distributing to other splunk servers.
- Use the local directory within your package to deploy configuration settings. (/opt/splunk/etc/deployment-apps/package1/local)
- Configure serverclass.conf with rules to push out apps and configurations by deployment class on the Deployment Server.
- Configure deploymentclient.conf on Client Servers to check in with the correct Deployment Server. This will also set the phoneHome feature for your deployment client.
- Reload the Deployment Server.
Create Deployment Packages
Organizational skills really help when developing deployment packages. We determined that it is necessary to define your architecture and create classes during planning in order to know how they will be different. You can create classes and sub-classes by identifying the different use cases for each splunk component type. For example, if you have central indexers and an indexer used for critical security events, you might want to set up the following classes: idx-central and idx-critical.
Each deployment package should be developed with the functionality of the Splunk component type in mind. You should name each deployment package for the class or subclass of each bundle so that rules and and packages coincide for easier troubleshooting. And each package should contain only the conf files and apps necessary for each class of system to operate effectively. For example, if you are configuring an indexer, then you should include inputs.conf with the splunktcp attribute defined.
Basics required for each Splunk component type:
- Deployment Server: serverclass.conf
- Search Head: web.conf, authentication.conf, authorize.conf, server.conf
- Indexers: authentication.conf, server.conf, inputs.conf, web.conf
- Heavy Forwarders: outputs.conf, web.conf
- Light Forwarders: outputs.conf, web.conf
- Universal Forwarders: outputs.conf, web.conf
Likewise, it is important to note that you may want to set up different information and settings for deployment clients in the field and that should be done by a separate deployment-clients package which will contain all of the settings required for deployment clients such as phoneHome settings in a single bundle. This bundled package can then be added to all Splunk components and centrally managed.
Tip: Consider using a spreadsheet to plan out your deployment classes identifying necessary conf files and apps that will be required for each class.
Once I realized that this file was much like a firewall rule set it became easy for me to understand what was needed and how to set up complex deployment scenarios. Afterall, some of the same firewall concepts apply: ordinality and cardinality are important, wildcards can be applied, and classes can be defined either top-level or granularly.
- Place a global whitelist rule with wildcard at the top of the serverclass.conf file.
- Use IPs (10.1.1.1) instead of hostnames (abc-def01.domain).
- Use VLANs (10.1.1.X) instead of naming prefixes (abc-defXX.domain.com).
- Try to narrow the scope of wildcards within deployment classes. (10.1.1.* instead of 10.1.*)
- If you whitelist or blacklist, check and double check that your numbering is ordered properly. (0,1,2,3)
- The complexity of the Deployment Server is commonly caused by filtering and the order of rules within serverclass.conf. Filtering and order of processing are paired. If you specify whitelisting as the filter type, then whitelists are applied before blacklists. If you choose the opposite, then blacklists are applied before the whitelists.
Rolling out deploymentclient.conf
The deployment functionality for Splunk requires setting up a deployment client so that each remote system will phoneHome. This is an important part of the configuration for the Deployment Server since it helps to ensure that the remote systems have the latest configurations and apps as well as providing an administrator with a single place to check all deployed clients.
Reload instead of Restart
Splunk developed the reload function as part of its server which reloads the Deployment Server and saves you from having to restart the splunk service.
5. Troubleshooting and Debugging
There are a few common pitfalls when configuring deployment server functionality that can be avoided. Here are some of the ones that we encountered and what was done to correct them:
- Deployed configuration being trumped by local configuration - If someone logs into a system and changes a configuration located in system/local then restarts the system you may encounter issues with the Splunk functionality for that component. Those unfamiliar with the deployment server functionality may try to make direct changes to configuration files which conflict with existing deployed settings. This is actually more common than you would think and requires configuration monitoring to prevent. See Splunk FIM.
- Conf files that require parameters -If you have copied and pasted files then things like hostname= are invariably going to be incorrect on each system and will likely cause problems. To correct this, consider using environment variables within your configuration files to create parameter based configurations specific to each host.
- Hostnames not resolving - I know I mentioned this in the serverclass.conf section above but I will explain why here. I found that hostnames did not resolve properly due to DNS issues and this can cause settings on some systems to not get updated properly. When we switched to IP addresses in the serverclass.conf it helped to resolve many of the issues.
- To forward or not to forward - When you have a serverclass.conf rule that does not work properly followed by rules with wildcards, you may find that settings and apps get pushed to the wrong classes. Be sure to number your rules properly and review your serverclass.conf before reload.
- SSL password hashes not correct - If you are running SSL within your Splunk infrastructure, and you should be, then you are likely to encounter this issue if you end up copying configurations from existing systems as you develop your deployment packages. In order for SSL to operate properly you have to ensure that the password (sslKeysfilePassword) is the same in server.conf as it is in other configuration files for the same system. If it is different, SSL will not operate properly and you will encounter a blocking effect which prevents messages from being delivered or functionality from operating properly. You will also see error messages which read: SSL error or Can't read key file. Determine which password is incorrect and fix or delete them and start over.
- Indexers not running as expected - When indexers stop running there can be a variety of reasons for this to occur. First and foremost, roll out the web.conf file and enable the web interface for your indexers so that you can validate issues more rapidly. Using this method, I found that forwarders had been deployed to indexers which had caused them to stop running. Inputs.conf had incorrect attribute definitions which prevented logs from being indexed. And at one point, the indexers had been moved and indexes corrupted which required splunk fsck to be run before they would restart.