The list of instrumentation points below are the ones that are being used within discovery to gather inforamtion from the kernel.One other thing to be noted is all the threads that are registering the instrumentation points are identified by their pid which is gathered when the thread passes through the instrumentation point.
In all of the listed events, the task structure pointer is available as current macro in the kernel. The information associated with each event can be used in addition to the current process to identify a number of scenarios of interest in the process of discovering computation structure.
Fork - FORK/DO_FORK
- Common to both process forking and thread spawning.
- child thread’s pid,
- clone flags.
- kernel/fork.c - do_fork()
- Unique identifier - task structure.
- This information is generally used to add a child process to a CCSM set representing the computation, whose contents we are discovering.
Signal - SIGNAL/SEND_SIGNAL
- Not exactly a passive component as it doesn’t have any data structure representing the signal as a separate entity.
- Thread receving the signal (pid),
- Signal Number.
- kernel/signal.c - send_signal()
- Unique identifier - task structure
- If either the sending or the receiving thread is a member of the CCSM set representing the computation whose structure we are discovering then we add the other process if it is not already a member.
Named Pipe/ Fifo - FIFO/FIFO_OPEN
- It is a passive component that is persistent in the filesystem.
- Named Pipe inode id
- File System id
- Complete pathname to the pipe/fifo file.
- Mode in which the named pipe was opened.
- fs/fifo.c - fifo_open()
- Unique Identifier - FileSystemID/InodeID
- If the current process is a member of the CCSM set of concern, then we add the Named pipe component to the set, if it is not already a member. If the current process is not a member of the CCSM set, then we add it to the set if the named pipe is a member of the set.
Pipes - PIPE/DO_PIPE
- Passive Component that cannot be accessed by processes that are unrelated to each other through forking.
- Pipe inode id
- File System id
- Reader File Descriptor.
- Writer File Descriptor.
- Reader Pipe opened in which mode.
- Writer Pipe opened in which mode.
- If the current process is a member of the CCSM set, then we add the Pipe Component to the set.
- fs/pipe.c - do_pipe_flags()
- Unique Identifier - FileSystemID/InodeID
Shared Memory
- Passive Component that represents a block of memory used for sharing information across processes.
- Shared Memory Get, SHMEM/SHMGET
- Inode id of the shared memory.
- File system id.
- shared memory address.
- Key
- Size
- Flags
- shared memory id
- ipc/util.c - ipcget()
- Shared Memory Attach, SHMEM/SHMAT
- Flags
- Shared Memory ID
- Shared Memory Address
- Inode Id
- Filesystem ID
- ipc/shm.c - shmat()
- Shared Memory Detach, SHMEM/SHMDT
- Shared memory Address
- Inode id
- File system Id.
- ipc/shm.c - shmdt()
- Unique Identifier - FileSystemID/InodeID
- If the current Process is a member of the set, then we add the shared segment to the CCSM set, if the current process is not a member then we check whether it trying to access any of the shared segments of the CCSM set, if so add the current process to the set.
Sockets - SOCKET/”Name of the system call in this category”
- Passive Component using which computations can talk locally as well as across machines.
- We gather the same set of data for all the socket system calls, socket(), listen(), accept(), connect(), bind(),sendto(), recvfrom().
- Inode id of the socket.
- File system Id of the socket.
- File Descriptor
- Mode in which the socket is opened.
- Family Type of the Socket.
- Destination address and Port
- Source address and Port.
- If the current Process is a member of the CCSM set, then we add the Socket Component to the set. If the current Process is not a member of the set then we check whether it is trying to access any of the socket components known, if so, we add the process to the set. At this point, we would not be able to find out the other process if it is in a different machine.
- net/socket.c - respective system calls.
- Unique Identifier - FileSystemID/InodeID
File
- Passive Component that is used by process to read or write information into them.
- We gather these data as part of the open and close system calls for a file, FILE/OPEN, FILE/CLOSE
- Inode id
- File System id
- File mode
- File Descriptor.
- Complete pathname for the file.
- fs/open.c - open(), close()
- we gather these information for the read and write system calls, FILE/READ_INFO, FILE/READ_DATA, FILE/WRITE_INFO, FILE/WRITE_DATA
- Inode id
- File system id
- File Descriptor
- Size of Data Being Transfered
- Data being Transfered.
- fs/read_write.c - read(), write()
- These are the information gathered for file-locking purposes, FILE/FLOCK
- Inode Id
- File System Id
- Command Issued for File Locking F_SETLK or F_SETLKW and so on.
- Complete path to the File.
- fs/fcntl.c - do_fcntl()
- Unique Identifier - FileSystemID/InodeID
- If the current Process is a member of the CCSM set, then we add the File component to the set which is being locked or unlocked, If the current Process is not a member of the set then we check whether it is trying to access any of the File Components of the set based on which we add the process to the CCSM set.
Until now we have seen all the Kernel Instrumentation points that are related to passive Components as listed above. We are now going to list down all the other instrumentation points that we make use of in our discovery postprocessing to know more about what the process is doing.
Exec - EXECVE/DO_EXECVE
- Used for identifying what the process has execed upon
- Filename of the program being execed.
- fs/exec.c - do_execve().
Dup - FILE/DUP
- Used for knowing whenever a process is trying to duplicate a file descriptor.
- Inode id of the file.
- File system id
- pathname
- old file descriptor
- new file descriptor.
- mode in which the file was opened.
- fs/fcntl.c - dup3().
Fd_install - FILE/FD_INSTALL
- Used for asserting that the process is using that file descriptor.
- File descriptor.
- fs/open.c - fd_install()
System calls - SYSCALL/SYSTEM_CALL
- Used in postprocessing for identifying the list of system calls that the process has used.
- System call Number
- System call arguments.
- hook in entry.S and the instrumentation point in kernel/dski.c
Exit - EXIT/DO_EXIT
- Used for knowing whether a particular process has exited or not.
- exit code.
- kernel/exit.c - do_exit()
Scheduler - SCHEDULER/SWITCH_FROM, SCHEDULER/SWITCH_TO
- These are context Switching events.
- Preemption parameter.
- kernel/sched.c - __sched__schedule()
We also have some User side instrumentation points which are called DSUI points that we care about if the process we are tracing is actually using the DSUI framework within it. In other words if the Process we are tracing is treated as a white box then we consider these instrumentation points as part of postprocessing.
DSUI signal thread - DSCVR/DSUI_SIGNAL_THREAD
- Used for identifying which is the DSUI signal thread.
- Signal thread pid
- datastreams/src/dsui/libdsui3/dsui.c - dsui_start()
DSUI Buffer thread - DSCVR/DSUI_BUFFER_THREAD
- Used for identifying which is the DSUI Buffer thread.
- Buffer thread pid
- datastreams/src/dsui/libdsui3/dsui.c - dsui_start()
DSUI Logging thread - DSCVR/DSUI_LOGGING_THREAD
- Used for identifying which is the DSUI Logging thread.
- Logging thread pid
- datastreams/src/dsui/libdsui3/dsui.c - dsui_start()
Traceme - TRACEME/TRACEME_TOOL
- Used for identifying the traceme tool pid.
- Tracme tool thread id
- datastreams/src/datastreams/discovery/tools/traceme
There are also other User level Instrumentation Points that we consider for processing when a Java application is under the scanner. These user level points are placed in the Java Virtual Machine to identify the different components of the Java Virtual Machine involved in executing the Java Application which is treated as a Black Box.
TO BE CONTINUED.
Postprocessing is a very important part of the discovery process. This is the place where we do more analysis and group information that we get from the kernel and user side into a logical format called the Observed computation actions.
Before going into the different Observed Computation actions, There are two python data structures that have been designed to represent the active component as well as the passive component. They are called ACS and PCS respectively.
The instrumentation we are using to profile the computation is also constantly being improved. Here is a list of the information we are reporting in our profile summaries:
The set of passive components. Passive components may be created when a known active component does one of the following:
- Inherits an open file descriptor (only the root thread may do this)
- Calls open to open a reference to a file, named pipe, or pseudo terminal
- Calls pipe to create a pipe
- Calls socket to create a socket endpoint
- Calls accept to create a server socket endpoint
Passive components have the following attributes:
Here we are going to list out all the Observed Computation Actions we have come up with. The ones that have the same information as that of the corresponding instrumentation point are not deeply explained.
TO BE CONTINUED