| |
| /* Make a thread the running thread. The thread must previously been |
| sleeping, and not holding the CPU semaphore. This will set the |
| thread state to VgTs_Runnable, and the thread will attempt to take |
| the CPU semaphore. By the time it returns, tid will be the running |
| thread. */ |
| extern void VG_(set_running) ( ThreadId tid ); |
| |
| /* Set a thread into a sleeping state. Before the call, the thread |
| must be runnable, and holding the CPU semaphore. When this call |
| returns, the thread will be set to the specified sleeping state, |
| and will not be holding the CPU semaphore. Note that another |
| thread could be running by the time this call returns, so the |
| caller must be careful not to touch any shared state. It is also |
| the caller's responsibility to actually block until the thread is |
| ready to run again. */ |
| extern void VG_(set_sleeping) ( ThreadId tid, ThreadStatus state ); |
| |
| |
| The master semaphore is run_sema in vg_scheduler.c. |
| |
| |
| (what happens at a fork?) |
| |
| VG_(scheduler_init) registers sched_fork_cleanup as a child atfork |
| handler. sched_fork_cleanup, among other things, reinitializes the |
| semaphore with a new pipe so the process has its own. |
| |
| -------------------------------------------------------------------- |
| |
| Re: New World signal handling |
| From: Jeremy Fitzhardinge <jeremy@goop.org> |
| To: Julian Seward <jseward@acm.org> |
| Date: Mon Mar 14 09:03:51 2005 |
| |
| Well, the big-picture things to be clear about are: |
| |
| 1. signal handlers are process-wide global state |
| 2. signal masks are per-thread (there's no notion of a process-wide |
| signal mask) |
| 3. a signal can be targeted to either |
| 1. the whole process (any eligable thread is picked for |
| delivery), or |
| 2. a specific thread |
| |
| 1 is why it is always a bug to temporarily reset a signal handler (say, |
| for SIGSEGV), because if any other thread happens to be sent one in that |
| window it will cause havok (I think there's still one instance of this |
| in the symtab stuff). |
| 2 is the meat of your questions; more below. |
| 3 is responsible for some of the nitty detail in the signal stuff, so |
| its worth bearing in mind to understand it all. (Note that even if a |
| signal is targeting the whole process, its only ever delivered to one |
| particular thread; there's no such thing as a broadcast signal.) |
| |
| While a thread are running core code or generated code, it has almost |
| all its signals blocked (all but the fault signals: SEGV, BUS, ILL, etc). |
| |
| Every N basic blocks, each thread calls VG_(poll_signals) to see what |
| signals are pending for it. poll_signals grabs the next pending signal |
| which the client signal mask doesn't block, and sets it up for delivery; |
| it uses the sigtimedwait() syscall to fetch blocked pending signals |
| rather than have them delivered to a signal handler. This means that |
| we avoid the complexity of having signals delivered asynchronously via |
| the signal handlers; we can just poll for them synchronously when |
| they're easy to deal with. |
| |
| Fault signals, being caused by a specific instruction, are the exception |
| because they can't be held off; if they're blocked when an instruction |
| raises one, the kernel will just summarily kill the process. Therefore, |
| they need to be always unblocked, and the signal handler is called when |
| an instruction raises one of these exceptions. (It's also necessary to |
| call poll_signals after any syscall which may raise a signal, since |
| signal-raising syscalls are considered to be synchronous with respect to |
| their signal; ie, calling kill(getpid(), SIGUSR1) will call the handler |
| for SIGUSR1 before kill is seen to complete.) |
| |
| The one time when the thread's real signal mask actually matches the |
| client's requested signal mask is while running a blocking syscall. We |
| have to set things up to accept signals during a syscall so that we get |
| the right signal-interrupts-syscall semantics. The tricky part about |
| this is that there's no general atomic |
| set-signal-mask-and-block-in-syscall mechanism, so we need to fake it |
| with the stuff in VGA_(_client_syscall)/VGA_(interrupted_syscall). |
| These two basically form an explicit state machine, where the state |
| variable is the instruction pointer, which allows it to determine what |
| point the syscall got to when the async signal happens. By keeping the |
| window where signals are actually unblocked very narrow, the number of |
| possible states is pretty small. |
| |
| This is all quite nice because the kernel does almost all the work of |
| determining which thread should get a signal, what the correct action |
| for a syscall when it has been interrupted is, etc. Particularly nice |
| is that we don't need to worry about all the queuing semantics, and the |
| per-signal special cases (which is, roughly, signals 1-32 are not queued |
| except when they are, and signals 33-64 are queued except when they aren't). |
| |
| BUT, there's another complexity: because the Unix signal mechanism has |
| been overloaded to deal with two separate kinds of events (asynchronous |
| signals raised by kill(), and synchronous faults raised by an |
| instruction), we can't block a signal for one form and not the other. |
| That is, because we have to leave SIGSEGV unblocked for faulting |
| instructions, it also leaves us open to getting an async SIGSEGV sent |
| with kill(pid, SIGSEGV). |
| |
| To handle this case, there's a small per-thread signal queue set up to |
| deal with this case (I'm using tid 0's queue for "signals sent to the |
| whole process" - a hack, I'll admit). If an async SIGSEGV (etc) signal |
| appears, then it is pushed onto the appropriate queue. |
| VG_(poll_signals) also checks these queues for pending signals to decide |
| what signal to deliver next. These queues are only manipulated with |
| *all* signals blocked, so there's no risk of two concurrent async signal |
| handlers modifying the queues at once. Also, because the liklihood of |
| actually being sent an async SIGSEGV is pretty low, the queues are only |
| allocated on demand. |
| |
| |
| |
| There are two mechanisms to prevent disaster if multiple threads get |
| signals concurrently. One is that a signal handler is set up to block a |
| set of signals while the signal is being delivered. Valgrind's handlers |
| block all signals, so there's no risk of a new signal being delivered to |
| the same thread until the old handler has finished. |
| |
| The other is that if the thread which recieves the signal is not running |
| (ie, doesn't hold the run_sema, which implies it must be waiting for a |
| syscall to complete), then the signal handler will grab the run_sema |
| before making any global state changes. Since the only time we can get |
| an async signal asynchronously is during a blocking syscall, this should |
| be all the time. (And since synchronous signals are always the result of |
| running an instruction, we should already be holding run_sema.) |
| |
| |
| Valgrind will occasionally generate signals for itself. These are always |
| synchronous faults as a result instruction fetch or something an |
| instruction did. The two mechanims are the synth_fault_* functions, |
| which are used to signal a problem while fetching an instruction, or by |
| getting generated code to call a helper which contains a fault-raising |
| instruction (used to deal with illegal/unimplemented instructions and |
| for instructions who's only job is to raise exceptions). |
| |
| That all explains how signals come in, but the second part is how they |
| get delivered. |
| |
| The main function for this is VG_(deliver_signal). There are three cases: |
| |
| 1. the process is ignoring the signal (SIG_IGN) |
| 2. the process is using the default handler (SIG_DFL) |
| 3. the process has a handler for the signal |
| |
| In general, VG_(deliver_signal) shouldn't be called for ignored signals; |
| if it has been called, it assumes the ignore is being overridden (if an |
| instruction gets a SEGV etc, SIG_IGN is ignored and treated as SIG_DFL). |
| |
| VG_(deliver_signal) handles the default handler case, and the |
| client-specified signal handler case. |
| |
| The default handler case is relatively easy: the signal's default action |
| is either Terminate, or Ignore. We can ignore Ignore. |
| |
| Terminate always kills the entire process; there's no such thing as a |
| thread-specific signal death. Terminate comes in two forms: with |
| coredump, or without. vg_default_action() will write a core file, and |
| then will tell all the threads to start terminating; it then longjmps |
| back to the current thread's scheduler loop. The scheduler loop will |
| terminate immediately, and the master_tid thread will wait for all the |
| others to exit before shutting down the process (this is the same |
| mechanism as exit_group). |
| |
| Delivering a signal to a client-side handler modifys the thread state so |
| that there's a signal frame on the stack, and the instruction pointer is |
| pointing to the handler. The fiddly bit is that there are two |
| completely different signal frame formats: old and RT. While in theory |
| the exact shape of these frames on stack is abstracted, there are real |
| programs which know exactly where various parts of the structures are on |
| stack (most notably, g++'s exception throwing code), which is why it has |
| to have two separate pieces of code for each frame format. Another |
| tricky case is dealing with the client stack running out/overflowing |
| while setting up the signal frame. |
| |
| Signal return is also interesting. There are two syscalls, sigreturn |
| and rt_sigreturn, which a signal handler will use to resume execution. |
| The client will call the right one for the frame it was passed, so the |
| core doesn't need to track that state. The tricky part is moving the |
| frame's register state back into the thread's state, particularly all |
| the FPU state reformatting gunk. Also, *sigreturn checks for new |
| pending signals after the old frame has been cleaned up, since there's a |
| requirement that all deliverable pending signals are delivered before |
| the mainline code makes progress. This means that a program could |
| live-lock on signals, but that's what would happen running natively... |
| |
| Another thing to watch for: programs which unwind the stack (like gdb, |
| or exception throwers) recognize the existence of a signal frame by |
| looking at the code the return address points to: if it is one of the |
| two specific signal return sequences, it knows its a signal frame. |
| That's why the signal handler return address must point to a very |
| specific set of instructions. |
| |
| |
| What else. Ah, the two internal signals. |
| |
| SIGVGKILL is pretty straightforward: its just used to dislodge a thread |
| from being blocked in a syscall, so that we can get the thread to |
| terminate in a timely fashion. |
| |
| SIGVGCHLD is used by a thread to tell the master_tid that it has |
| exited. However, the only time the master_tid cares about this is when |
| it has already exited, and its waiting for everyone else to exit. If |
| the master_tid hasn't exited, then this signal is ignored. It isn't |
| enough to simply block it, because that will cause a pile of queued |
| SIGVGCHLDs to build up, eventually clogging the kernel's signal delivery |
| mechanism. If its unblocked and ignored, it doesn't interrupt syscalls |
| and it doesn't accumulate. |
| |
| |
| I hope that helps clarify things. And explain why there's so much stuff |
| in there: it's tracking a very complex and arcane underlying set of |
| machinery. |
| |
| J |
| |
| -------------------------------------------------------------------- |
| |
| >I've been seeing references to 'master thread' around the place. |
| >What distinguishes the master thread from the rest? Where does |
| >the requirement to have a master thread come from? |
| > |
| It used to be tid 1, but I had to generalize it. |
| |
| The master_tid isn't very special; its main job is at process shutdown. |
| It waits for all the other threads to exit, and then produces all the |
| final reports. Until it exits, it's just a normal thread, with no other |
| responsibilities. |
| |
| The alternative to having a master thread would be to make whichever |
| thread exits last be responsible for emitting all the output. That |
| would work, but it would make the results a bit asynchronous (that is, |
| if the main thread exits and the other hang around for a while, anyone |
| waiting on the process would see it as having exited, but no results |
| would have been produced). |
| |
| VG_(master_tid) is a varable to handle the case where a threaded program |
| forks. In the first process, the master_tid will be 1. If that program |
| creates a few threads, and then, say, thread 3 forks, the child process |
| will have a single thread in it. In the child, master_tid will be 3. |
| It was easier to make the master thread a variable than to try to work |
| out how to rename thread 3 to 1 after a fork. |
| |
| J |
| |
| -------------------------------------------------------------------- |
| |
| Re: Fwd: Documentation of kernel's signal routing ? |
| From: David Woodhouse <...> |
| To: Julian Seward <jseward@acm.org> |
| |
| > Regarding sys_clone created threads. I have a vague idea that |
| > there is a notion of 'thread group'. I further understand that if |
| > one thread in a group calls sys_exit_group then all threads in that |
| > group exit. Whereas if a thread calls sys_exit then just that |
| > thread exits. |
| > |
| > I'm pretty hazy on this: |
| |
| Hmm, so am I :) |
| |
| > * Is the above correct? |
| |
| Yes, I believe so. |
| |
| > * How is thread-group membership defined/changed? |
| |
| By specifying CLONE_THREAD in the flags to clone(), you remain part of |
| the same thread group as the parent. In a single-threaded process, the |
| thread group id (tgid) is the same as the pid. |
| |
| Linux just has tasks, which sometimes happen to share VM -- and now with |
| NPTL we also share other stuff like signals, etc. The 'pid' in Linux is |
| what POSIX would call the 'thread id', and the 'tgid' in Linux is |
| equivalent to the POSIX 'pid'. |
| |
| > * Do you know offhand how LinuxThreads and NPTL use thread groups? |
| |
| I believe that LT doesn't use the kernel's concept of thread groups at |
| all. LT predates the kernel's support for proper POSIX-like sharing of |
| anything much but memory, so uses only the CLONE_VM (and possibly |
| CLONE_FILES) flags. I don't _think_ it uses CLONE_SIGHAND -- it does |
| most of its work by propagating signals manually between threads. |
| |
| NTPL uses thread groups as generated by the CLONE_THREAD flag, which is |
| what invokes the POSIX-related thread semantics. |
| |
| > Is it the case that each LinuxThreads threads is in its own |
| > group whereas all NTPL threads [in a process] are in a single |
| > group? |
| |
| Yes, that's my understanding. |
| |
| -- |
| dwmw2 |