Mica2 Radio Communication Analysis
M.Schippling -- v0.1 -- 12/16/2005
schip@etantdonnes.com


Introduction

This is a synopsys of  my exploration into the communication speed and reliability of the Mica2 Mote from Xbow technologies running the TinyOS embeded operating system. The mica2 uses a Chipcon CC1000 RF tranceiver chip. My application is in robotics telemetry and control over a fairly short range, so the focus of this study is a bit different from the usual Sensor Net setup.

Test System

The test system comprises a Windows XP PC connected to a MIB510 programmer/basestation. The basestation contains a mica2 programmed with a slightly modified (see below) TOSBase program. There are five re-Mote devices contained in individual robot cars which are powered by batteries or from their wall chargers. They are running my standard robot control program that samples I/O ports under a 1ms timer, so there is a good amount of code running "in the background" of the message loop, which is also described below. The devices I used for this system run at 900Mhz and were programmed under the contrib/xbow/apps tree using TOS 1.1.7. These xbow compile time radio options were used:
Which boil down to running at 903Mhz with maximum power output.

Message Loop

The robot system relies on commands being sent from the PC host followed by responses from the individual re-Mote devices using point-to-point addressing. Each command is sent to a specific device and the device responds directly to the basestation sender. In the robot system, a command initiates a motion and the response indicates the result of the motion, so there is usually some unpredictable delay between send and receive at the host. But in this test system I use an immediate response "status request" command which is serviced by the re-Mote as soon as it is received. Thus the only "turn-around" delays are caused by interleaved interrupt handling. (As alluded to, there is a 1ms timer loop which samples ADCs and digiital input ports so there can be some significant variation in response time. However this can be viewed as 'normal' behavior in a sensor system and may not have a large effect on the messaging. We hope.)

The command/response structure allows the host to drive the timing of the message loop at any rate it sees fit. The re-Mote devices are purely slaves and do not send any unbidden messages of their own. This is both good and bad, some of the minuses are dealt with in the Experiment description section below.

Messages

The exchanged messages are full sized TOSMsg structures of 36 bytes each. All messages contain (as part of the standard header) the destination MoteID, plus my additional tracking fields (the numbers are the byte offset in the full message):
In addition to the above, the data area of the Status Request message contains some control information which can disable my cobbled-together message recovery system. The data area of the Status Response contains a bunch of robot specific stuff, plus these few measurement fields:
The above fields allow the point-to-point mechanism to work correctly, and give an indication of  where messages may have been lost when they are dropped.

Base Station

The basestation used the standard TOSBase code compiled under the xbow tree. However I have made two modifications to the code in order to use message ACKs. The first is to enable ACKs using  MacControl.enableAck(); in the StdControl.init() method. The second is to modify RadioSend.sendDone() to pass on the radio ACK rather than just returning the serial receive ACK. This allows the host to know if the actual radio send was acknowledged.

Also the host side tools/java/net/tinyos/packet/Packetizer.java class was modified to shorten the ACK wait timeout from 1 second to 50ms. This is the maximum time from start that the message send method will wait for an ACK from the re-Mote via TOSBase. This has an effect on the measured send times, but it is also about the shortest time that we can spend hoping to get the ACK.

Environment

The test system is setup in my attic...it sounds bad but actually it's only a little chilly. The one piece of perhaps useful information is that there is a metal roof directly above which may affect radio communications, but that can't be helped. The attic is available for rent....

There are five re-Mote devices placed in a 2 meter semi-circle around the basestation and separated by .7 meters each. The furthest devices are also about 2meters apart. At this distance they should all be able to hear each other so we hope there are no significant Hidden Node issues.

image of test setup

Since this is a operational robot system I have used the existing antennas which are not entirely symetrical. The basestation has a 1/4-wave whip recommended by Xbow (DigiKey...err....) . The re-Motes have little 900Mhz nubbins, also from Digikey ...number ahhh....In general there seems to be no issue with communication between these devices at reasonable power.

images of antennas

Experiments

Using this system I performed a number of experiment runs to evaluate the message speed and reliability. In each run the message send/receive loop was executed for five minutes using different parameters. The variables explored were:
This is the delay from the end of one send to the beginning of the next. In some situations (e.g., when ACKs are used) the send may take more or less time so this is not a fixed loop time. It is added to the send time on each cycle, so there is 20-40 ms of jitter in the overall message loop timing.
This has some significant effects which are described below.
This also had some significant and surprising effects. As it stands the robot communication system is using ACKs to try to tame the unreliable nature of the mica2 radio. However this may not be a good idea afterall.

Results

RoundRobin Message Cycle Using ACKs

In this set of experiments messages are sent point-to-point in a round-robin fashion. A number of  runs were tried with different numbers of re-Mote devices and varying delay times between command sends. With sufficient time between command sends,  each status reply should not have to deal with collisions, so this should show best case responses. The ACK mechanism was turned on, but in general ACKs were missing on 15% to 20% of the commands and for under 1% of the replies. Neither of these 'facts' has been analyzed in any detail, so I wouldn't put a lot of faith in them...

The glossy overview graphs:

graph

Above shows the relationship between the message delay and the number of message cycles that were dropped. What we see is that, without throttling the send, a very high percentage of messages do not get through. From examination of the raw data it appears that most messages are dropped at the re-Mote end (commands are not received). This could be due to the different antennas being used on basestation and re-Mote.


graph

Above shows the overall number of successful messages that can be exchanged versus the percentage that fail to get through. It shows that around 10 exchanges per second can be expected with a less than 3% failure rate. This particular value used a 60ms delay between command sends. If higher failure rates are acceptable one could get up to 13 or so msg/sec but with a LOT of dropped messages. This would be using a much shorter inter-command delay time.

Remember that "messages" here refers to "message cycles" of one command and one reply, so it appears that 20 individual messages per second is a good ballpark speed estimate. This assumes that there is very little chance of transmissions overlapping or having to backoff using CSMA.

Broadcast Message Cycle Using ACKs

In this set of experiments commands are sent as Broadcast messages (using the destination ID -1) and are sent only once per cycle. There are five re-Motes and all are expected to receive and reply to each command, hopefully before another command cycle begins. Replies are still point-to-point, in that they have the basestation address as a destination ID. Message ACKs are enabled on both ends but the broadcast commands (with destination -1) are never ACKed. However the replies received at the basestation are ACKed and this seems to lead to some unpleasent results.

The useful thing about using Broadcast commands is that it triggers all the re-Motes to reply at the same time, so it shows how effective the CSMA mechanism is in interleaving multiple media accesses.

The glossy overview graphs:

graph

Above again shows the relationship between the message delay and the number of message cycles that were dropped. And again, what we see is that, without throttling the send, a very high percentage of messages do not get through. But we also see that the best success rate is still horrible, with 10-30% of the message cycles failing. From the raw data it appears that somewhere between 10-30% of those failures are due to the command not being received and the rest are replies not getting back through.

The extremely high failure rate was troubling, so I tried the next set of experiments...


Broadcast Message Cycle NOT Using ACKs

In this set of experiments, as in the previous set, commands are sent as Broadcast messages (using the destination ID -1) and are sent only once per cycle. There are five re-Motes and all are expected to receive and reply to each command, hopefully before another command cycle begins. However in this set ACKs are disabled at both ends, basestation and reMote. These runs were done with an inter-command delay of 350ms, which is more than enough to allow for all re-Motes to reply. (The worst case message cycle time was about 210ms -- from beginning of transmit to end of receive). I did not do a set of varying delay runs, but it would seem that about 200ms between messages  (150ms delay)  would be optimum with five re-Motes.

The glossy overview graphs:

graph

This graph shows the rate of message dropping versus the number of devices receiving the broadcast command. I by-passed some of the more detailed experiments in order to get a feel for the system scaling with the number of re-Motes. In general this experiment showed that without ACKs the broadcast mechanism works much more reliably.
Where we had 20% or worse loss with the ACKs turned on in the basestation, now it is about 1/3 that with the same number of devices competing for airwaves. Why this should be so is left as an exercise for the reader...

The pink "Xmit loss" trace is a scaled indication of how many failures are due to commands not being received. A value of 1 means 100% of failures are due to the command not getting through. The ratio of  'transmission' failures decreases as devices are added, but the relative number per device remains about the same under all circumstances.

The interesting thing here is that the message reliability seems to scale linearly with the number of devices, assuming that enough time is allowed for all devices to respond. Also, the difference between a 1% drop with one device and 6.5% with five shows that the CSMA mechanism is not foolproof. This is a worst case scenerio because all of the re-Motes are trying to access the airwaves at the same time, so perhaps 6 or 7% loss is not so bad. The raw data shows that message replies are generally interleaved in 20-30ms steps. No further analysis to catagorize failures was done.

RoundRobin Message Cycle NOT Using ACKs

Just for completeness I did a set of runs using roundrobin command addressing, but with all ACKs disabled. This is not as rigorous as the first experiment set above, but shows similar behavior. It was done with five re-Motes and a varying command delay time. As one might expect, due to the absence of the ACK send and receive wait, a somewhat faster message cycle time is possible.

The glossy overview graphs:

graph

This is the same type of graph as in the first experiment series. It shows the dependence of successful message cycles on the command cycle delay time. It also shows that a much shorter delay can be used where there are no ACKs.


graph

Again, this is the same type of graph as in the first experiment series. It shows the number of successful messages one can expect versus how many fail. It shows that some few more message cycles are possible for about the same error rate when were there are no ACKs. The best rate with fewest failures was using a 25ms delay (or about 75ms total message cycle time) and gave close to 13 messages per second with less than 2% failure, of which about 20% were lost command transmissions. Again each message cycle is a set of two mica2 messages, so the actual message rate is twice what is shown on the graph.

Almost Raw Results and Test Code

The spread sheet with the accumulated data used to generate these graphs is here:
The raw-raw data, each message transaction timing and such, is available, as is the TOS and Java code used to run the tests. I'm just to lazy to get it up right now...

Conclusions

Using a message request/reply system where a basestation requests a message from each re-Mote, by either indivudual point-to-point request and reply or broadcast request with a p-t-p reply, I have arrived at the following conclusions. These are under generally best case conditions where five re-Mote mica2s and one base station are within 2 meters of each other:

Point-to-point request messages should be throttled to less than about 10 per second or many replies will be dropped.

Broadcast  request messages should be run without message ACKs enabled and should be throttled such that there is about 20-30ms per re-Mote to allow for replies. If the request timing is too short many replies will be dropped.

With message ACKs enabled in a point-to-point roundrobin mode where there is little chance of transmit overlap and CSMA backoff, about 20 successful individual messages per second can be sent with less than 3% failure rate. However about  20% of the message ACKs are never received.

When ACKs are enabled in a semi-broadcast mode where there is a good chance of CSMA backoff, message failures increase drastically, up to a 30% failure rate in some cases. See Experiments for details.

Without ACKs, in a roundrobin mode where there is little chance of transmit overlap and CSMA backoff, successful messages are around 26 per second with less than 2% failure rate.

Without ACKs in a broadcast mode where there is maximum chance of transmit overlap and CSMA backoff, successful messages are around 25 per second but the failure rate varies linearly from 1% with one re-Mote to 6.5% with five re-Motes.
I don't know how much the failure rate linearity can be extrapolated.

xxx