\NeedsTeXFormat{LaTeX2e}
\documentclass[a4paper]{article}  % landscape confuses psnup...
% use -t landscape with dvips if landscape is selected!
% article*, report*, book*, letter, slides... (*: can do t.o.c.)

\input{standard.tex}

\newboolean{single}
\setboolean{single}{true}
 
% in standard.tex: for general emphasizing:
% \newcommand{\ROT}[1]{\Blue{\emph{#1}}}
% for introducing names of persons:
\newcommand{\R}[1]{\Red{\emph{#1}}}
% feeling green today:
\newcommand{\G}[1]{\Green{\emph{#1}}}
% or maybe blue?
\newcommand{\B}[1]{\Blue{\emph{#1}}}

\ifthenelse{\boolean{single}}
{
  \title{Three-site attachment experiment series: Perceived attachment}
  \author{Author of this section: Eric Auer}
  \newcommand{\SECONE}[0]{}
}{
  \newcommand{\SECONE}[0]{\section{Three-site attachment
     experiment series: Perceived attachment}}
}

\begin{document}
% \raggedbottom
% \raggedright % do not split words - looks better sometimes
\sloppy % prefer underfull to overfull hboxes

\ifthenelse{\boolean{single}}
{
\maketitle

% {\small Bla bla abstract bla one column small bla}

}{
% guess what: no title if this is part of the combined paper...
}

\begin{multicols}{2}


\SECONE

After completing the production experiment, we did a small perceived
attachment test. This was mainly to verify that the subjects were
indeed able to conceive pronounciation patterns that would enable a
listener to get the right understanding of the sentences or fragments.

Each possible attachment of each test item had been uttered by \R{two}
different subjects in each of the two production experiments (with
and without relative clause). Now, each of the recordings was listened
to by \R{two} of us (only the native speakers of us did this work).

This has the disadvantage that we as perceivers are biased by knowing
what the experiment is about. For example we know that there is always
exactly one NP most important. But as we did not memorize the way
the three possible cases were assigned to the speakers and there were
no other clues about this either, the experiment setup was still
realistic enough. Again, the goal of this experiment was only to
verify that the subjects \R{did} manage to convey the NP selection
only using their voice.

The results were logged using questionnaires of roughly the same
design as the ones used for the pretest: For each item, we selected
for each of the three cases how much we believed that this case was
the intended one. The scales had a $1$ to $5$ range each. Normally, we
would select $5$ for the perceived attachment/stress and $3$ for the
others, but the scale system \B{allowed} to describe less clear cases
as well.

As with the pretest, each set of results (one of us listening to one
speaker) was \R{normalized} to a mean of $0$ and a standard deviation
of $1$ before doing further processing on it. The rationale for this
is the same as for the normalization in the pretest. In our case, the
results were mostly uniform, so the normalization shifted the mean
by $3.6$ and scaled the results by about $1.1$ in most cases.

From the $192$ data points ($4 \times 48$) for each intended
attachment/stress for each of the two experiments, we computed the
\B{mean} and \B{standard deviation} of how much we believed in each
of the three possible intensions. The results are as follows (mean,
with standard deviation in parentheses):

\end{multicols}


\newcommand{\VAL}[2]{$ #1 \quad (#2) $}

\begin{tabular}{|r|l|l|l|}
\hline
& \multicolumn{3}{|c|}{Production task without relative clauses} \\
\hline
Intended: & \multicolumn{3}{|c|}{Perceived:} \\
& NP1 & NP2 & NP3 \\
\hline
NP1 & \R{\VAL{1.420}{0.287}} & \VAL{-0.673}{0.233} & \VAL{-0.701}{0.010} \\
\hline
NP2 & \VAL{-0.702}{0.009} & \R{\VAL{1.445}{0.101}} & \VAL{-0.702}{0.009} \\
\hline
NP3 & \VAL{-0.665}{0.269} & \VAL{-0.603}{0.436} & \R{\VAL{1.170}{0.628}} \\
\hline
\end{tabular}

\begin{tabular}{|r|l|l|l|}
\hline
& \multicolumn{3}{|c|}{Production task with relative clauses} \\
\hline
Intended: & \multicolumn{3}{|c|}{Perceived:} \\
& NP1 & NP2 & NP3 \\
\hline
NP1 & \R{\VAL{1.239}{0.694}} & \VAL{-0.677}{0.174} & \VAL{-0.505}{0.603} \\
\hline
NP2 & \VAL{-0.186}{0.913} & \R{\VAL{0.473}{1.008}} & \VAL{-0.373}{0.742} \\
\hline
NP3 & \VAL{-0.462}{0.661} & \VAL{-0.435}{0.675} & \R{\VAL{0.926}{0.963}} \\
\hline
\end{tabular}


\begin{multicols}{2}

At first glance, the data in the tables still gives the impression
that perception of the intended stress/attachment was not easy but
possible: It seems that for the experiment \B{without relative
clauses}, perception of the intension \B{NP3 stressed} leads to
most uncertainity. For the experiment \B{with relative clauses},
we get an impressive amount of uncertainity while trying to perceive
the intended attachment for the \B{NP2 attachment}, while the other
cases seem to pose only medium difficulty for the listener.

The \R{F-values}\footnote{
Let $M$ be the mean of the means of a line, then we have
$MSB=\sum_i (mean_i-M)^2$ and $MSW=\sum_i (192 \times stddev_i^2)$
and $F=\frac{189 \times MSB}{2 \times MSW}$
} calculated for each line of the tables predict far
more problems: For the experiment without relative clauses, the
F-values per intension are \B{$NP1=10.653, \quad NP2=146.024, \quad
NP3=1.628$}, and for the experiment with relative clauses
\B{NP1=1.263, \quad NP2=0.081, \quad NP3=0.341}.

This can be interpreted to say that if there was a relative clause
at all of if the speaker tried to stress the \R{third} NP in a list
of \B{NP1 prep NP2 prep NP3}, the listener has only a minimal
chance to do significantly better than \R{guessing} to perceive the
intended stress/attachment correctly!

Nevertheless, we went on and analyzed the speech recordings for
\B{pitch, volume and duration (lengthening and pauses)} patterns.
This would tell us about the means used by the speakers while
\R{trying} to convey the intended stress/attachment, even though
their success was quite limited for certain cases as we have seen.

\end{multicols}

\end{document}