How to use rex command to extract fields in Splunk? - Karunsubramanian.com (2024)


One of the most powerful features of Splunk, the market leader in log aggregation and operational data intelligence, is the ability to extract fields while searching for data. Unfortunately, it can be a daunting task to get this working correctly. In this article, I’ll explain how you can extract fields using Splunk SPL’s rex command. I’ll provide plenty of examples with actual SPL queries. In my experience, rex is one of the most useful commands in the long list of SPL commands. I’ll also reveal one secret command that can make this process super easy. By fully reading this article you will gain a deeper understanding of fields, and learn how to use rex command to extract fields from your data.

What is a field?

A field is a name-value pair that is searchable. Virtually all searches in Splunk uses fields. A field can contain multiple values. Also, a given field need not appear in all of your events. Let’s consider the following SPL.

index=main sourcetype=access_combined_wcookie action=purchase

The fields in the above SPL are “index”, “sourcetype” and “action”. The values are “main”, “access_combined_wcookie” and “purchase” respectively.

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (1)

Fields turbo charge your searches by enabling you to customize and tailor your searches. For example, consider the following SPL

index=web sourcetype=access_combined status>=500 response_time>6000

The above SPL searches the index web which happens have web access logs, with sourcetype equal to access_combined, status grater than or equal to 500 (indicating a server side error) and response_time grater than 6 seconds (or 6000 milli seconds). This kind of flexibility in exploring data will never be possible with simple text searching.

How are fields created?

There is some good news here. Splunk automatically creates many fields for you. The process of creating fields from the raw data is called extraction. By default Splunk extracts many fields during index time. The most notable ones are:

index
host
sourcetype
source
_time
_indextime
splunk_server

You can configure Splunk to extract additional fields during index time based on your data and the constraints you specify. This process is also known as adding custom fields during index time. This is achieved through configuring props.conf, transforms.conf and fields.conf. Note that if you are using Splunk in a distributed environment, props.conf and transforms.conf reside on the Indexers (also called Search Peers) while fields.conf reside on the Search Heads. And if you are using a Heavy Forwarder, props.conf and transforms.conf reside there instead of Indexers.

While index-time extraction seems appealing, you should try to avoid it for the following reasons.

  1. Indexed extractions use more disk space.
  2. Indexed extractions are not flexible. i.e. if you change the configuration of any of the indexed extractions, the entire index needs to be rebuilt.
  3. There is a performance impact as Indexers do more work during index time.

Instead, you should use search-time extractions. Schema-on-Read, in fact, is the superior strength of Splunk that you won’t find in any other log aggregation platforms. Schema-on-Write, which requires you to define the fields ahead of Indexing, is what you will find in most log aggregation platforms (including Elastic Search). With Schema-on-Read that Splunk uses, you slice and dice the data during search time with no persistent modifications done to the indexes. This also provides the most flexibility as you define how the fields should be extracted.

Many ways of extracting fields in Splunk during search-time

There are several ways of extracting fields during search-time. These include the following.

  1. Using the Field Extractor utility in Splunk Web
  2. Using the Fields menu in Settings in Splunk Web
  3. Using the configuration files
  4. Using SPL commands
    • rex
    • extract
    • multikv
    • spath
    • xmlkv/xpath
    • kvform

For Splunk neophytes, using the Field Extractor utility is a great start. However as you gain more experience with field extractions, you will start to realize that the Field extractor does not always come up with the most efficient regular expressions. Eventually, you will start to leverage the power of rex command and regular expressions, which is what we are going to look in detail now.

What is rex?

rex is a SPL (Search Processing Language) command that extracts fields from the raw data based on the pattern you specify using regular expressions.

The command takes search results as input (i.e the command is written after a pipe in SPL). It matches a regular expression pattern in each event, and saves the value in a field that you specify. Let’s see a working example to understand the syntax.

Consider the following raw event.

Thu Jan 16 2018 00:15:06 mailsv1 sshd[5801]: Failed password for invalid user desktop from 194.8.74.23 port 2285 ssh2

The above event is from Splunk tutorial data. Let’s say you want to extract the port number as a field. Using the rex command, you would use the following SPL:

index=main sourcetype=secure
| rex "port\s(?<portNumber>\d+)\s"

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (2)

Once you have port extracted as a field, you can use it just like any other field. For example, the following SPL retrieves events with port numbers between 1000 and 2000.

index=main sourcetype=secure
| rex "port\s(?<portNumber>\d+)\s"
| where portNumber >= 1000 AND portNumber < 2000

Note: rex has two modes of operation. The sed mode, denoted by option mode=sed lets you replace characters in an existing field. We will not discuss sed more in this blog.

Note: Do not confuse the SPL command regex with rex. regex filters search results using a regular expression (i.e removes events that do not match the regular expression provided with regex command).

Syntax of rex

Let’s unpack the syntax of rex.

rex field=<field> <PCRE named capture group>

The PCRE named capture group works the following way:

(?<name>regex)

The above expression captures the text matched by regex into the group name.

Note: You may also see (?P<name>regex) used in named capture groups (notice the character P). In Splunk, you can use either approach.

If you don’t specify the field name, rex applies to _raw (which is the entire event). Specifying a field greatly improves performance (especially if your events are large. Typically I would consider any event over 10-15 lines as large).

There is also an option named max_match which is set to 1 by default i.e, rex retains only the first match. If you set this option to 0, there is no limit to the number of matches in an event and rex creates a multi valued field in case of multiple matches.

As you can sense by now, mastering rex means getting a good handle of Regular Expressions. In fact, it is all out regular expressions when it comes to rex. It is best learned through examples. Let’s dive right in.

Learn rex through examples

Extract a value followed by a string

Raw Event:

Thu Jan 16 2018 00:15:06 mailsv1 sshd[5258]: Failed password for invalid user testuser from 194.8.74.23 port 3626 ssh2

Extract a field named username that is followed by the string user in the events.

index=main sourcetype=secure
| rex "user\s(?<username>\w+)\s"

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (3)
How to use rex command to extract fields in Splunk? - Karunsubramanian.com (4)

Isn’t that beautiful?

Now, let’s dig deep in to the command and break it down.

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (5)

Extract a value based a pattern of the string

This can be super handy. Extract java exceptions as a field.

Raw Event:

08:24:42 ERROR : Unexpected error while launching program.
java.lang.NullPointerException
at com.xilinx.sdk.debug.core.XilinxAppLaunchConfiguration
Delegate.isFpgaConfigured(XilinxAppLaunchConfigurati
onDelegate.java:293)

Extract any java Exception as a field. Note that java exceptions have the form java.<package hierarchy>.<Exception>. For example:

java.lang.NullPointerException
java.net.connectexception
javax.net.ssl.SSLHandshakeException

So, the following regex matching will do the trick.

java\..*Exception

Explanation:

java: A literal string java
\. : Backslash followed by period. In regex, backslash escapes the following character, meaning it will interpret the following character as it is. Period (.) stands for any character in regex. In this case we want to literally match a period. So, we escape it.
.* : Period followed by Star (*). In regex, * indicates zero or more of the preceding character. Simply .* means anything.
Exception: A literal string Exception.

Our full blown SPL looks like this:

index=main sourcetype=java-logs
| rex "(?<javaException>java\..*Exception)"

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (6)

Let’s add some complexity to it. Let’s say you have exceptions that look like the following:

javax.net.ssl.SSLHandshakeException

Notice the “x” in javax ? How can we account for x ? Ideally what we want is to have rex extract the java exception regardless of javax or java. Thanks to the character class and “?” quantifier.

java[x]?\..*Exception

Let us consider new raw events.

08:24:42 ERROR : Unexpected error while launching program.java.lang.NullPointerExceptionat com.xilinx.sdk.debug.core.XilinxAppLaunchConfiguration08:24:43 ERROR : javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

Our new SPL looks like this:

index=main sourcetype=java-logs1
| rex "(?<javaException>java[x]?\..*Exception)"

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (7)

That’s much better. Our extracted field javaException captured the exception from both the events.

Wait a minute. Is something wrong with this extraction,

javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException

Apparently, the extraction captured two exceptions. The raw event looks like this:

08:24:43 ERROR : javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

Apparently the regex java[x]?\..*Exception is matching all the way up to the second instance of the string “Exception”.

This is called greedy matching in regex. By default the quantifiers such as “*” and “+” will try to match as many characters as possible. In order to force a ‘lazy’ behaviour, we use the quantifier “?”. Our new SPL looks like this:

index=main sourcetype=java-logs1
| rex "(?<javaException>java[x]?\..*?Exception)"

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (8)

That’s much much better!

Extract credit card numbers as a field

Let’s say you have credit card numbers in your log file (very bad idea). Let’s say they all the format XXXX-XXXX-XXXX-XXXX, where X is any digit. You can easily extract the field using the following SPL

index="main" sourcetype="custom-card"
| rex "(?<cardNumber>\d{4}-\d{4}-\d{4}-\d{4})"

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (9)

The {} helps with applying a multiplier. For example, \d{4} means 4 digits. \d{1,4} means between 1 and 4 digits. Note that you can group characters and apply multipliers on them too. For example, the above SPL can be written as following:

index="main" sourcetype="custom-card"
| rex "(?<cardNumber>(\d{4}-){3}\d{4})"

Extract multiple fields

You can extract multiple fields in the same rex command.

Consider the following raw event

Thu Jan 16 2018 00:15:06 mailsv1 sshd[5276]: Failed password for invalid user appserver from 194.8.74.23 port 3351 ssh2

The above event is from Splunk tutorial data.

You can extract the user name, ip address and port number in one rex command as follows:

index="main" sourcetype=secure
| rex "invalid user (?<userName>\w+) from (?<ipAddress>(\d{1,3}.){3}\d{1,3}) port (?<port>\d+) "

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (10)

Also note that you can pipe the results of the rex command to further reporting commands. For example, from the above example, if you want to find the top user with login errors, you will use the following SPL

index="main" sourcetype=secure
| rex "invalid user (?<userName>\w+) from (?<ipAddress>(\d{1,3}.){3}\d{1,3}) port (?<port>\d+) "
| top limit=15 userName

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (11)

Regular Expression Cheat-Sheet

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (12)

A short-cut

Regex, while powerful, can be hard to grasp in the beginning. Fortunately, Splunk includes a command called erex which will generate the regex for you. All you have to do is provide samples of data and Splunk will figure out a possible regular expression. While I don’t recommend relying fully on erex, it can be a great way to learn regex.

For example use the following SPL to extract IP Address from the data we used in our previous example:

index="main" sourcetype=secure
| erex ipAddress examples="194.8.74.23,109.169.32.135"

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (13)

Not bad at all. Without writing any regex, we are able to use Splunk to figure out the field extraction for us. Here is the best part: When you click on “Job” (just above the Timeline), you can see the actual regular expression that Splunk has come up with.

How to use rex command to extract fields in Splunk? - Karunsubramanian.com (14)

Successfully learned regex. Consider using: | rex "(?i) from (?P[^ ]+)"

I’ll let you analyze the regex that Splunk had come up with for this example :-). One hint: The (?i) in the above regex stands for “case insensitive”

That brings us to the end of this blog. I hope you have become a bit more comfortable using rex to extract fields in Splunk. Like I mentioned, it is one of the most powerful commands in SPL. Feel free to use as often you need. Before you know, you will be helping your peers with regex.

Happy Splunking!


How to use rex command to extract fields in Splunk? - Karunsubramanian.com (2024)

FAQs

How do I extract specific fields in Splunk? ›

  • On your add-on homepage, click Extract Fields on the Add-on Builder navigation bar.
  • On the Extract Fields page, from Sourcetype, select a source type to parse.
  • From Format, select the data format of the data. Any detected format type is automatically selected and you can change the format type as needed. ...
  • Click Parse.
Jun 13, 2022

What is the Rex field command in Splunk? ›

The rex command matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names. When mode=sed , the given sed expression used to replace or substitute characters is applied to the value of the chosen field.

How do you extract information from Splunk? ›

Extracting Data with the /Timeserieswindow Endpoint

The /timeserieswindow endpoint in the Splunk API outputs raw metric data in JSON format. This method uses a very simple API, so it's easy to use. It is a good choice if you simply need metrics from a prior time period, without any filters or analytics applied.

Which command can be used to extract certain fields or columns? ›

The cut command is used to extract specified columns/characters of a piece of text, which is given as follows: -c : Specifies the filtering of characters. -d : Specifies the delimiter for fields. -f : Specifies the field number.

What is the use of fields command in Splunk? ›

The SPL2 fields command specifies which fields to keep or remove from the search results. By default, the internal fields _raw and _time are included in the output.

How to extract fields from XML in Splunk? ›

  1. Go to the event.
  2. Click "Event Actions"
  3. Click "Extract Fields"
  4. Copy examples of the fields you want from your data into the examples box, like multiple actions or message ids.
  5. Test generated regex, edit as needed.
  6. Save as field extraction.

How do I search for a specific field in Splunk? ›

When you search for fields, you use the syntax field_name = field_value .
  1. Field names are case sensitive, but field values are not.
  2. You can use wildcards in field values.
  3. Quotation marks are required when the field values include spaces.
Jul 27, 2023

How do I remove data from a field in Splunk? ›

If you want to remove specific fields in your data, then: In the Fields function, enter the fields you want to remove from your data in the field_list and type - in the operator field. For example, to remove the source field, type source in the field_list and - in the operator field.

What extracts fields from raw event data in Splunk? ›

Explanation: At search time, the Splunk engine extracts fields from raw event data. Essentially, when a search is conducted, the engine dynamically retrieves and identifies different field names and values within the raw data.

How do I export data from Splunk? ›

Export data using Splunk Web
  1. After you run a search, report, or pivot, click the Export button. The Export button is one of the Search action buttons. ...
  2. Click Format and select the format that you want the search results to be exported in. ...
  3. Optional. ...
  4. Optional. ...
  5. Click Export to save the job events in the export file.

What is the field extraction limit in Splunk? ›

By default, the limit for the number of fields that can be extracted automatically at search time is 100. You can set this number higher by editing the limits. conf file in $SPLUNK_HOME/etc/system/local and changing the limit setting to a number that is higher than the number of columns in the structured data file.

How do I get data out of Splunk? ›

Exporting data starts with running a search job to generate results. You can then export this search result data to a file. Run a search job using a POST to /services/search/jobs/ . If you are using a custom time range, pass it in with the POST request.

How to use collect command in Splunk? ›

What is the Splunk Collect Command? With the collect command, any data pulled from a search can be sent to another index. Luckily using it is a breeze: just pipe | collect index=”your index here” onto the end of your search.

How do I ingest data into Splunk? ›

How to get data into your Splunk deployment
  1. How do you want to add data?
  2. Upload data.
  3. Monitor data.
  4. Forward data.
  5. Assign the correct source types to your data.
  6. Prepare your data for preview.
  7. Modify event processing.
  8. Modify input settings.
Sep 25, 2023

How to extract specific fields from JSON? ›

To extract JSON data from a column and put it in separate columns:
  1. Go to the column in the workbook. ...
  2. Use the column's menu to select Extract columns. ...
  3. In the modal, select the fields you want to pull out into their own columns.
  4. Click Confirm. ...
  5. Use the new columns in your explorations and analysis.

What are selected fields in Splunk? ›

These are the fields that the Splunk software extracts from your data. When you first run a search the Selected Fields list contains the default fields host, source, and sourcetype.

How do I search multiple fields in Splunk? ›

The syntax is simple: field IN (value1, value2, ...) Note: The IN operator must be in uppercase. You can also use a wildcard in the value list to search for similar values.

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Jerrold Considine

Last Updated:

Views: 6218

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Jerrold Considine

Birthday: 1993-11-03

Address: Suite 447 3463 Marybelle Circles, New Marlin, AL 20765

Phone: +5816749283868

Job: Sales Executive

Hobby: Air sports, Sand art, Electronics, LARPing, Baseball, Book restoration, Puzzles

Introduction: My name is Jerrold Considine, I am a combative, cheerful, encouraging, happy, enthusiastic, funny, kind person who loves writing and wants to share my knowledge and understanding with you.