My system notes

Monday, November 9, 2015

Create a new Splunk Index

Create a new Index

I wanted to collect some data that no one else would be interested in. It had no need to be retained and was purely a one-off for test information. In short, a good case for a temporary index.

In this example, I'm going to show a simple scripted input and a simple logfile monitor.

We use a deployment server and our indexers are not replicated, so this is fairly simple.

1. Create the new index in the indexes.conf that gets sent to your indexers;

tail /splunk/etc/deployment-apps/all_indexers/indexes.conf

[throwaway]
homePath = volume:primary/throwaway
coldPath = volume:primary/throwaway/colddb
thawedPath = $SPLUNK_DB/throwaway/thaweddb
tstatsHomePath = volume:primary/throwaway/datamodel_summary
summaryHomePath = volume:primary/throwaway/summary
maxMemMB = 20
maxHotBuckets = 10
maxConcurrentOptimizes = 6
maxTotalDataSizeMB = 1024
maxWarmDBCount = 30

Note, our data is managed on EMC storage so data ages and migrates on that according to other requirements.

These might be decent requirements for short retention;

#1GB bucket
maxDataSize = 1024
#1 day
maxHotIdleSecs = 86400
#30 buckets = 30days
maxWarmDBCount = 30
#90 days
frozenTimePeriodInSecs = 7776000 (90 days in sec, cold to frozen)
Not set. Once 90 days has passed the data is deleted.
#coldToFrozenDir =

2. Create a new app. Mine is in /splunk/etc/deployment-apps/throwaway_app;

.
|-- bin
| `-- script.sh|-- local
| |-- app.conf
| |-- inputs.conf
| `-- props.conf
`-- metadata
|-- default.meta
`-- local.meta

cat bin/script.sh

#!/bin/bash
top -n 1 | grep splun[k] | awk '{print $3" Virt:"$6" Res:"$7}'
ps -ef | grep splunk

cat local/inputs.conf

[script:///opt/splunkforwarder/etc/apps/throwaway_app/bin/script.sh]
disabled = false
index = throwaway
# Run every 15 minutes
interval = 900
source = throwaway.script
sourcetype = script:///opt/splunkforwarder/etc/apps/throwaway_app/bin/script.sh

[monitor:///opt/logfile.log]
disabled=false
index = throwaway
sourcetype = throwaway.logfile

cat local/inputs.conf

[top]
TZ = America/ChicagoDATETIME_CONFIG = CURRENT

Note: You'll need to specify where the scripts will be (I need to set some system variables.)

Note: This is where you direct the data to the index you want it to go.

Note: For the love of everything holy, put the timezone in there so so you'll be able to find any non-timestamped data later.

Next, update the appropriate serverclass.conf so the app can be pulled down by the targeted hosts.

#New app for a short term index
[serverClass:throwaway_app]
restartSplunkd = true
whitelist.0 = sljdsb02*
[serverClass:throwaway_app:app:throwaway_app]

At this point you can make everything available to the indexers and the forwarding hosts;

./splunk reload deploy-server

Thursday, October 8, 2015

Quick and dirty scripted inputs

Quick and dirty scripted inputs

Our storage team wanted to get some input from the powermt command periodically inserted into Splunk so that they would be able to run alerts against that data.

[root@entltdbb02:apps]$ powermt display
VNX logical device count=142
XtremIO logical device count=13
==============================================================================
----- Host Bus Adapters --------- ------ I/O Paths ----- ------ Stats ------
### HW Path                       Summary   Total   Dead IO/Sec Q-IOs Errors
==============================================================================
   4 iSCSI Initiator over TCP/I    optimal     142      0       -     0   2218
   6 iSCSI Initiator over TCP/I    optimal     142      0       -     6      0
   8 iSCSI Initiator over TCP/I    optimal     142      0       -     7      0
10 iSCSI Initiator over TCP/I    optimal     142      0       -     2   1367
19 iSCSI Initiator over TCP/I    optimal      13      0       -     0      0
20 iSCSI Initiator over TCP/I    optimal      13      0       -     0      0
21 iSCSI Initiator over TCP/I    optimal      13      0       -     0      0
22 iSCSI Initiator over TCP/I    optimal      13      0       -     0      0
23 iSCSI Initiator over TCP/I    optimal      13      0       -     0      0
24 iSCSI Initiator over TCP/I    optimal      13      0       -     0      0
25 iSCSI Initiator over TCP/I    optimal      13      0       -     0      0
26 iSCSI Initiator over TCP/I    optimal      13      0       -     0      0

There were 6 systems that required this alert to be run against them, and here's what I did to make that happen. This is not the most elegant solution, but it was quick and effective enough.

1. Create an app on the deployment server that contains basic app settings, a script, instructions on when to run the script and how to manage the output.

2. Create a class of servers to let Splunk know which servers to include in the app.

3. Enable the Splunk user to run the command on the target hosts via sudo.

Step 1 details:

1.a) Create the app structure;
mkdir /splunk/etc/deployment-apps/powermt/
mkdir /splunk/etc/deployment-apps/powermt/bin/
mkdir /splunk/etc/deployment-apps/powermt/local/
mkdir /splunk/etc/deployment-apps/powermt/metadata/

1b.) Write out basic app settings
cat /splunk/etc/deployment-apps/powermt/local/app.conf
[default]
[ui]
is_visible = false
[install]
state = enabled

1c.) Create a script to run the command in the bin directory. cat /splunk/etc/deployment-apps/powermt/bin/powermt.sh
#!/bin/bash
sudo powermt display

1d.) Write local/inputs.conf to describe what to run and when to run it. Note that the script location references its ultimate destination on the host.
cat /splunk/etc/deployment-apps/local/inputs.conf
##### Powermount scripted Inputs ######
[script:///opt/splunkforwarder/etc/apps/powermt/bin/powermt.sh]
## Run every 15 minutes
disabled = false
interval = 900
source = powermt
sourcetype = script:///opt/splunkforwarder/etc/apps/powermt/bin/powermt.sh]

1e.) Write local/props.conf to record the time for the script event cat /splunk/etc/deployment-apps/powermt/local/props.conf
[powermt]
TZ = America/Chicago

DATETIME_CONFIG = CURRENT

*Using a TZ is critical. I mean it. Ask me how I know.

Step 2 details:

2.a) Create a server class to distribute the app to the correct servers. Because this is a new server class, Splunk will need to be restarted.
#Checking on the powermt connection
[serverClass:powermt]
restartSplunkd = true
whitelist.0 = entlpdbc01*
whitelist.1 = entlpdbc02*
whitelist.2 = entlpdb07*
whitelist.3 = entltdbb01*
whitelist.4 = entltdbb02*
whitelist.5 = entltdb07*
[serverClass:powermt:app:powermt]

Step 3 details:

3.a) Because this script requires root access and splunk runs the script, I needed to add a sudo entry with EMC settings, etc. for each server. Here's what my entry looks like.

#========EMC COMMAND ACCESS===========
# User alias specification
User_Alias CMGU=splunk
# Cmnd alias specification
Cmnd_Alias CMGEMC=/tmp/nl_dwd/inq,/sbin/powermt
# User privilege specification
root ALL=(ALL) ALL
CMGU ALL=NOPASSWD:CMGEMC
#=========================================

After the clients phone home and pick up the new app, the data shows up in Splunk;

Again, this was quick and dirty and can certainly be cleaned up or made part of more broad "input script" support environment.

Wednesday, September 9, 2015

Splunk and Certs

Splunk and Certs

I inherited a Splunk installation and haven't spent as much time with the search functions as our development teams has, but I've seen it do some really cool stuff. My role is pretty much to make sure that it stays up and to help folks when they think something is broken.

This installation probably has a lot more moving parts than it really needs and I only spend a little time on it, so it's not unusual to get a surprise from it every now and then.

Finding out that data that wasn't getting indexed was Splunk's way of telling me that I had some expired certs. In *our* infrastructure we use a lot of intermediate forwarders, and I'm thinking that the reason for that was that our indexers stored data on NFS. That's not something that Splunk support endorses, and something that we changed (we're now running on block storage) but we're still left with 7 different environments and a total 12 heavy forwarders (several in HA configurations) that had expiring certs.

Each forwarder has cert called forwarder.pem and ours expired expired on 3/2018
STEP 1.a
lgs06: Expiration check
find /splunk/etc/deployment-apps/ -name forwarder.pem | xargs -I crt sh -c 'ls crt; openssl x509 -in crt -text -noout | grep "Not After"'
STEP 1.b
lgs06: Expiration fix
/splunk/bin/splunk createssl server-cert -d /splunk/etc/auth/ -n forwarder -p
password is Jostens123
Copy the file to each of the apps that require it. You found them in STEP 1.a
cp /splunk/etc/auth/forwarder.pem <app_location/certs>
STEP 1.c
lgs06:/deployment-apps/*/certs/cacert.pem forwarder.pem
Contents of each app's cacert.pem must match the last paragraph of the NEW forwarders.pem
/splunk/bin/splunk reload deploy-server
splunk admin name: admin
splunk admin pass: <redacted>

STEP 2.a
Heavy Forwarders
Expiration check.
openssl x509 -in owbsleplgf01.pem -text -noout | grep "Not After"
STEP 2.b
Heavy Forwarders
Expiration fix
/opt/splunkforwarder/bin/splunk createssl server-cert -d /opt/splunkforwarder/etc/auth/ -n <forwarderhostname> -p
cp /opt/splunkforwarder/etc/auth/<forwarderhostname> /opt/splunkforwarder/etc/certs
Contents of /opt/splunkforwarder/etc/certs/cacert.pem must match the last paragraph of the NEW forwarder.pem from STEP 1.b
STEP 2.c
Heavy Forwarders
Replace the hashed password in /opt/splunkforwarder/etc/system/local/inputs.conf
Before: password = <encryptedpassword>
After: password =<redacted>
service splunk restart
Troubleshooting:
Deployment server permissions issues? chown -R splunk:splunk /splunk/etc/deployment-apps
Deployments not getting to the hosts? Make sure /opt/splunkforwarder/etc/system/local/deploymentclient.conf isn't disabled

Done!