EFilter - A query language for Rekall.¶
The Rekall framework is plugin based. This is what makes it so extensible. Developers can add many different plugins to implement different analysis techniques and produce different data.
Historically, plugins had no restriction over the type of output they produced. While some plugins put thought into producing structured output, others produced output which was only usable by humans, since it was largely unstructured. As the needs for automation increased, it soon became obvious that plugin output needs to be machine parseable in some way.
For example, consider the humble pslist (APIPslist) plugin - a simple plugin which just displays the list of running processes in tabular form. Initially this plugin produced a number of columns such as process name, pid etc. Some users required the binary path, and that was added. Then some users requires restricing the listed processed by various means, such as a list of pids, process name regular expression, start time etc.
Then some users wanted to combine the output from several plugins in some way. For example, show all the vad regions from a the “chrome” process.
It soon became obvious that we could not just keep adding more and more flags to each plugin to control the way the plugin worked. The same kind of filtering was repeating in many plugins (e.g. filter by process names) and it was difficult to anticipate how users would like to combine plugins in the future.
We wanted to create a mechanism that gave users control over which results they wanted to see, how to filter the output and how to combine the output from several plugins together.
The idea of building a framework to facilitate arbitrary queries was born. We chose to model the query language after SQL which is widely understood, and this is how EFilter was born.
What is EFilter?¶
EFilter is an SQL like query language for combining, filtering and customizing the output of Rekall plugins. Just like in SQL, EFilter queries are used to generate a customized output, however, unlike a database query, EFilter runs Rekall plugins to generate data dynamically, rather than look at stored data.
Lets look at a simple EFilter query:
select proc.name, pid from pslist() where pid > 4
This query contains three main parts:
- The pslist() plugin will be executed and produce a set of rows. Each row contains several columns.
- The filter condition follows the “where” operator and specifies a condition. EFilter will evaluate the condition on each row emitted from the plugin and only matching rows will be displayed.
- The output is then produced in two columns which are derived from each emitted row.
Describing Plugins¶
In order for EFilter to work, each plugin must produce structured output in a specified format. We have seen before that plugins produce a sequence of rows, with each row having several columns. Each cell is a specific type of object.
Let us examine the pslist() plugin again. To get information about each plugin output we can use the describe (Describe) plugin:
[1] Live (API) 16:18:50> describe pslist, max_depth=1
Field Type
-------------------------------------------------- ----
proc LiveProcess
. as_dict method
. cmdline list
. connections list
. cpu_affinity list
. cpu_percent float
. cpu_times pcputimes
. create_time float
. cwd str
. environ dict
. exe str
. get_process_address_space method
. gids pgids
Name str
pid int
ppid int
Thds int
wow64 bool
start UnixTimeStamp
In the above example, we see that the plugin generates a Name columen with a type of string, pid and ppid columns which are integers as well as a more complex type, such as a UnixTimeStamp.
We can also see the field proc which is of type LiveProcess. This more complex type is like a python dictionary itself, and contains multiple members.
Note
In Rekall each plugin is free to produce any output - the output types of each plugin are not defined in advance (since they might change depending on the profile, OS version etc). Therefore it is difficult to predict in advance what each column will contain.
The describe plugin therefore needs to actually run the plugin and it inspects the output of the first row produced. While this works most of the time, it is often not possible to get a sensible result without supplying proper arguments. For example, consider the glob (IRGlob) plugin. When run with no arguments it does not produce any results (since there is nothing to glob). Therefore describe (Describe) will produce incorrect results.
To solve this predicament it is possible to run the describe() plugin with the args parameter, which should be a python dict of parameters to be passed to the plugin. This way the plugin maybe run with reasonable parameters and produce reasonable results.
We can apply operators on the cells emitted by a specific plugin to generate the desired output. For example, suppose we wanted to show the command line for each running process. We can see the proc object contains a cmdline field, and so we can simply issue:
select proc.name, proc.cmdline from pslist()
Note that the cmdline is a list (it is the process’s argv), and so Rekall will display it as such using the special annotation:
[1] Live (API) 16:32:48> select proc.name, proc.cmdline from pslist() where proc.name =~ "rekall"
cmdline name
----------------------------------- -------
- 0: rekall
/home/mic/projects/Dev/bin/python3
- 1:
/home/mic/projects/Dev/bin/rekall
- 2:
-v
- 3:
--live
- 4:
API
Operator rules.¶
EFilter is type aware and will try to do the right thing with each type if it makes sense. When the user applies an operator on a type, the operator will attempt to do something sensible (or else it will just return None). The operator should never raise an error.
For example consider the =~ operator which means a regular expression match. When we apply this operator on a single string, we expect that it match that string:
select * from pslist() where proc.name =~ "rekall"
If however we applied this operator on a list, we expect the row to match if any of the list items matches:
select * from pslist() where proc.cmdline =~ "--live"
Note that it is not an error to try to apply a regular expression to a non-string - it simply will never match. Therefore the following query will always return the empty set, since an integer can never match a regular expression:
select * from pslist() where proc.pid =~ "foobar"
Plugin arguments.¶
In the queries above we just ran the pslist plugin with no arguments. Most Rekall plugins, however, take some form of arguments. We can see the arguments that a plugin takes by consulting the plugin documentation or by appending “?” to the name of the plugin:
[1] Live (API) 21:12:35> pslist?
file: rekall-core/rekall/plugins/response/processes.py
Plugin: APIPslist (pslist)
: This is a Typed Plugin.
Positional Args: pids: One or more pids of processes to select. (type: ArrayIntParser)
Keyword Args:
profile: Name of the profile to load. This is the filename of the profile found in the profiles directory. Profiles are searched in the profile path order (If specified we disable autodetection).
proc_regex: A regex to select a process by name. (type: RegEx)
verbosity: An integer reflecting the amount of desired output: 0 = quiet, 10 = noisy. (type: IntParser)
It is possible to feed the result of an efilter query into the parameters from another plugin. Here is a trivial example:
[1] Live (API) 21:19:53> select * from pslist(pids: (select pid from pslist() where proc.name =~ "rekall"))
proc Name pid ppid Thds Hnds wow64 start binary
-------------- ------- ----- ----- ---- ---- ----------------- --------------------- -----------------------------------
rekall (7826) rekall 7826 7746 105 False 2018-01-27 05:12:20Z /home/mic/projects/Dev/bin/python3
Note the following about the subselect syntax:
Argument names are provided to the plugin with the “:” operator. This assigns the output of the sub-select as a list into the parameter.
The subselect must yield a single column. If the subselect yields more than one column, it is not clear which column should be assigned to the plugin parameter and Rekall will issue an error:
[1] Live (API) 21:19:43> select * from pslist(pids: (select * from pslist() where proc.name =~ "rekall")) 2018-01-26 21:19:43,526:CRITICAL:rekall.1:Invalid Args: pids invalid: Arg pids must be a list of integers.
The arg assigment operator tries to convert the subselect column into the type required by the parameter. This means that if the parameter expects an integer then the subselect should yield something which should be convertible to an integer:
[1] Live (API) 21:26:02> select * from pslist(pids: (select proc.name from pslist() where proc.name =~ "rekall")) 2018-01-26 21:26:02,643:CRITICAL:rekall.1:Invalid Args: pids invalid: invalid literal for int() with base 10: 'rekall'.
EFilter functions.¶
We have seen that EFilter offers operators to work on columns. In this section we see some of the more common functions and operators the language provides.
timestamp¶
The timestamp function converts its argument into a timestamp object. This allows Rekall to operate on the timestamp in a timezone aware way, compare it to other times etc.
Examples¶
The following are example queries which demonstrate how some plugins may be stringed together to achieve powerful combinations.
Finding Processes launched by a certain user.¶
Rekall has the tokens (GetSIDs) plugin which displays all the authorization tokens possessed by each process. Rekall also automatically resolves the token’s SID to a username.
[1] hank.aff4 22:54:29> tokens()
Process Sid Comment
-------------------------------------- ---------------- ----------------------------------
0xfa8000c9e040 System 4 S-1-5-18 Local System
0xfa8000c9e040 System 4 S-1-5-32-544 Administrators
0xfa8000c9e040 System 4 S-1-1-0 Everyone
0xfa8000c9e040 System 4 S-1-5-11 Authenticated Users
0xfa8000c9e040 System 4 S-1-16-16384 System Mandatory Level
Lets see all the processes started by “jessie”:
[1] hank.aff4 22:56:14> select * from tokens() where Comment =~ 'User: jessie'
Process Sid Comment
----------------------------------- ---------------------------------------------- -------------
0xfa8002418440 regsvr32.exe 884 S-1-5-21-4270721788-567995706-2532315982-1003 User: jessie
0xfa8001417720 explorer.exe 1512 S-1-5-21-4270721788-567995706-2532315982-1003 User: jessie
0xfa8000f95b30 VBoxTray.exe 1964 S-1-5-21-4270721788-567995706-2532315982-1003 User: jessie
0xfa8000fdc780 miranda64.exe 2208 S-1-5-21-4270721788-567995706-2532315982-1003 User: jessie
0xfa80022e2230 dwm.exe 2520 S-1-5-21-4270721788-567995706-2532315982-1003 User: jessie
0xfa8000f7d1b0 taskhost.exe 2596 S-1-5-21-4270721788-567995706-2532315982-1003 User: jessie
0xfa8002376060 taskhost.exe 2848 S-1-5-21-4270721788-567995706-2532315982-1003 User: jessie
Lets view each process creation time and its full command line. The Process column is not simply a string. It is a full blown Rekall object which represents the kernel’s _EPROCESS struct. We therefore can dereference individual members of _EPROCESS and retrieve additional information.
[1] hank.aff4 22:59:13> select Process, Process.CreateTime, Comment, Process.Peb.ProcessParameters.CommandLine from tokens() where Comment =~ 'User: jessie'
Process CreateTime Comment CommandLine
----------------------------------- --------------------- ------------- ---------------------------------------------------
0xfa8002418440 regsvr32.exe 884 2015-08-10 02:00:45Z User: jessie
0xfa8001417720 explorer.exe 1512 2015-08-10 02:00:41Z User: jessie C:\Windows\Explorer.EXE
0xfa8000f95b30 VBoxTray.exe 1964 2015-08-10 02:01:05Z User: jessie "C:\Windows\System32\VBoxTray.exe"
0xfa8000fdc780 miranda64.exe 2208 2015-08-10 02:01:37Z User: jessie "C:\Program Files (x86)\Miranda IM\miranda64.exe"
0xfa80022e2230 dwm.exe 2520 2015-08-10 02:00:41Z User: jessie "C:\Windows\system32\Dwm.exe"
0xfa8000f7d1b0 taskhost.exe 2596 2015-08-10 02:13:51Z User: jessie "taskhost.exe"
0xfa8002376060 taskhost.exe 2848 2015-08-10 02:00:40Z User: jessie "taskhost.77exe"
Find files modified in the last 2 days.¶
When Rekall is run in live mode, it can examine files on the local filesystem. This is useful for incident response situations. One of the more useful plugins available in live mode is the glob (IRGlob) plugin which enumerate files on the local filesystem based on one or more glob expressions (similar to the shell glob). According to the plugin documentation, we see that the plugin accepts a repeated parameter called “globs” for all the glob expressions. Let’s see all the files in the /etc/ directory:
[1] Live (API) 23:49:05> select * from glob(globs: "/etc/*")
path
----------------------------
/etc/papersize
/etc/logrotate.d
/etc/mime.types
/etc/kbd
Although the output appears to only contain a single column (“path”), we can see that the path is actually an object which contains a lot of information about each file.
[1] Live (API) 00:16:03> describe glob, args=dict(globs=["/etc/*"])
Field Type
-------------------------------------------------- ----
path FileInformation
. filename FileSpec
.. filesystem str
.. name str
.. path_sep str
. session -
. st_atime float
. st_ctime float
. st_dev int
. st_gid Group
.. gid int
.. group_name str
.. session NoneType
. st_ino int
. st_mode Permissions
. st_mtime float
. st_nlink int
. st_size int
. st_uid User
.. homedir str
.. session NoneType
.. shell str
.. uid int
.. username str
In particular we see that the path.st_mtime is a float describing the file’s modification time:
[1] Live (API) 00:29:08> select path.st_mtime, path from glob(globs: "/etc/*")
st_mtime path
------------------- ----------------------------
1516590897.1290069 /etc/papersize
1516687780.2982903 /etc/logrotate.d
1446219570.0 /etc/mime.types
Since the field is a float, Rekall does not understand that it is actually a timestamp, and therefore we can not do any time arithmetic on it. We therefore need to explitely convert the modification time to a timestamp using the timestamp function.
[1] Live (API) 00:31:50> select timestamp(path.st_mtime) as mtime, path from glob(globs: "/etc/*") where mtime > "2 days ago"
mtime path
--------------------- -----------------
2018-01-29 06:11:15Z /etc/resolv.conf
2018-01-29 06:11:15Z /etc/timezone
- Note the explicit conversion to a timestamp. This allows Rekall to apply time related operators on this column.
- The column is aliased as “mtime”, which appears as the title of the first column. More importantly, the alias can be used in further calculations (specifically inside the where clause).
- Note the human readable time specification “2 days ago”. Rekall supports such convenient expressions, as well as exactly formatted times.