Undergraduate Research
May 2002 - December 2002
Advisor - Dr. Jafar Saniie
Partner - Christopher Middendorf


Abstract:
The purpose of this research project was to get familiar with the ATMEL family of 8-bit RISC microcontrollers and design a TV Game Console. We implemented the game TIC TAC TOE. The ATMEGA163 microcontroller was chosen for this purpose and was used togenerate TV signals which were then fed to a TV Monitor. The main program resided in the microcontroller's EEPROM.

Main Hardware Components
The main components required to generate video signals are:
1. The ATMega163 microcontroller mounted on the STK500 development kit.
2. An 8 MHz crystal, which is required for the microcontroller to run at it’s full speed of 8 MHz
3. Video DAC, which is a network of 3 resistors and a transistor and is used to generate the three video levels as shown.

Two bits of a port are used to generate three video levels:
• Sync level = 0 volts
• Black level = 0.3
• White level = 1.0

Video DAC

Problems Encountered

1. Video Signal Standard

There are three major TV-standards: NTSC, SECAM and PAL. The NTSC (Short for "National Television System Committe", is the American TV-standard, it has only 525 scan-lines, but it has an update frequency of 30Hz. SECAM (Short for "Sequential Color And Memory", is the French TV-standard, it has improved color stability and higher intensity resolution but with less color resolution. The European standard is SCART (Syndicat des Constructeurs d'Appareils Radiorécepteurs et Téléviseurs). We didn’t realise we were focusing on the wrong standard in the beginning. The SCART standard. Due to this reason, we were unable to find proper information and equipment to implement the project. We did a lot of research on this standard and had to switch to the American Standard, NTSC. This resulted in a lot of time wasted running in the wrong direction.

2. Timing

As mentioned above, the timing has to be very precise to be able to generate video signals properly. Even though the ATMega163 is capable of running at 8MHz, it was running only at 3.6MHz.The reason being that in order that the microcontroller runs at 8MHz, an 8MHz crystal has to be plugged on the STK500 board. This resolved a lot of timing issues.

3. TV Signal or Video Signal?

Chris brought in his TV so we could test the software. We spent a lot of time and efforts analyzing the output signals from the ports, which looked fine. We could see the video and the syn signals on the digital oscilloscope but we still didn’t have anything on the TV. After a good number of days and trying to debug the program and check the hardware over and over again, we realised that we were using the wrong kind of TV. A regular TV requires TV signals while TV Monitor requires Video signals. We however realized this in time and tested the software with a TV Monitor. And voila!! We had a bright checker board pattern on the screen.

4. Software Availability

A major hindrance we faced towards the end of the semester when we started writing the game was the availability of the software. We were initially working the Student Version of Codevision AVR C Compiler. This version has code size limitations. In the beginning since we were writing only small programs, there wasn’t a need to buy the full version. Once the size of the code started increasing, we could no longer use the student version. We decided to switch to the GNU C Compiler, which is freeware. The problem with this compiler is that it is very hard to use. Learning all the tips and tricks was taking a lot of time. We presented our problem to the Department and they were more than generous to buy the full Standard Version of the Cdevision C Compiler

The complete report is available here.

Close Window

 

Computer Society International Design Competition (CSIDC) 2003
Real-time Parking Analysis Solution System (RPASS)

Advisor - Dr. Donald Ucci
Team Members - Raghu Kutty, Nayana Samaranayake

Abstract—This project introduced a Real-time Parking Analysis Solution System (RPASS) which is a revolutionary development in parking space management. Over the past few years parking space has become a major issue in large metropolitan areas around the globe. While most cities have constructed multistory parking lots at their heart, people still find it hard to locate a parking lot with free space without actually visiting the location. Even when they find such free spaces, they end, more often than not, a few blocks from their target destination.

Our proposal was a Real-Time Parking Analysis Solution System (RPASS). This technology would solve many problems associated with parking and provide the customer with easy access to information in real time. Unlike conventional systems where the customer has to search for parking lots before hand and reserve spaces, the proposed technology would allow the user to gain access to real-time information on parking spaces within a certain radius of his or her current location or intended destination.

System Overview

The system was broken down into four main components to facilitate modular development. The system is comprised of Sensor Control Units, a Parking Box (P-Box), Central Server and Client handheld device. The Figure 1.1 below shows how the four components interact with each other.

Sensor Control Units: Sensor Control Units are an essential part of the system. The sensors can be photo sensors or pressure sensors and provide the core information regarding the occupancy of a parking lot. Positioned at the entrances and exits, the units transmit a signal to the P-Box as a car enters or exits a lot.

P-Box:
A single P-Box in a parking lot handles all sensor Control Units’ functioning
in that parking lot. The P-Box is responsible for providing information on the occupancy of a parking lot to the Central Server. The P-Box collates all signals transmitted by the sensor units in the parking lot.

Block Diagram of the system

Central Sever:
The central server is responsible for maintaining the status of all parking
lots using the information sent out by the P-Boxes. The server would be running a database engine to keep track of this information. The Server would also be hosting a web-service to provide this information to the Client. According to the size of the system, the database and web-service could be on two different machines. The database contains information such as space availability, location of parking lot and cost that are made available to the public through a web-service that would be hosted on the server. To facilitate the needs of PDA, cellular phone and other handheld device users, the web-service would be using WML.

Client Device:
Any device that supports Wireless Markup Language (WML), which is part of the Wireless Application Protocol (WAP), can be used as a Client Device. PDA’s, cellular phone’s and many other handheld devices are currently WAP enabled. On the Client side, the web-service can be accessed over a desktop or any WAP enabled mobile platform. The user would be able to provide the zip code or the destination location name (such as “Midway Airport” through the web service, and get information on parking lots around that location. Future versions of the system could also use GPS technology to transmit the GPS coordinates of the current location.

Close Window

 

8-Bit POWERALU

In this project the chip designed was required to implement the functions of a basic calculator using an efficient method for approximating the reciprocal using Goldschmidt’s division or division by convergence. The Goldschmidt algorithm can be applied to a variety of different systems and is also suitable for square-root. Some of the requirements are listed below:

  • The overall chip size was limited to 1.5mm by 1.5mm to fit within a pre-designed 40-pin pad frame and pad cells.
  • Scan logic that provides a scan path from a "Scan In" pin on the chip through all sequential logic elements, to a "Scan Out" pin.
  • A separate "Scan Mode" input pin was also used to put sequential logic elements in test mode.
  • The chip was designed for operation at 10 MHz.

Although division is less frequent among the four basic arithmetic operations, a recent study shows that in a typical numerical program, the time spent performing divisions is approximately the same as the time spent performing additions or multiplications. This is due to the fact that in most current processors, division is significantly slower than the other operations. Hence, faster implementations of division are desirable. There are two principal classes of division algorithms. The digit-recurrence methods produce one quotient digit per cycle using residual recurrence which involves (i) redundant additions, (ii) multiplications with a single digit, and (iii) a quotient-digit selection function. The latency and complexity of implementation depends on the radix. The method produces both the quotient which can be easily rounded and the remainder. The iterative, quadratically convergent, methods, such as the Newton-Raphson, the Goldschmidt and series expansion methods use multiplications and take advantage of fast multipliers.

DESIGN FLOW METHODOLOGY:
The design flow steps highlighted in the lab manual was carefully followed to ensure successful completion of the project.

Architecture Design

The different subsystems in the design were then designed and specifications were written.
The state diagrams for the FSM was also specified at this stage.

Component Design
Verilog descriptions were written for each component in the architecture design according to the detailed specification. Each component in the architecture design was simulated and verified. The block schematics were then synthesized using synopsys and the simulation data obtained was also verified.

Assembled Chip Core
All subsystems were connected together and more simulation tests were carried out to verify the integrity of the integrated components. The entire chip was then simulated with IRSIM to check for correctness

SYNTHESIS AND PLACE AND ROUTE (P & R) FLOW
A simple script (compile.scr) was modified to perform the synthesis cells designed using Verilog. The synthesis was performed using Silicon Ensemble. “dc_shell” was the command used to carry out the synthesis. After running dc_shell, some extra pads, corner and supply pads were added to get exactly 40 pads for a Mosis padframe.
The script “iit06_fixpads” was run to undertake this task. We ran gds2mag to carry out the Place and route. The layout was thoroughly checked for DRC errors. After extracting the layout in magic, the script “polyres_convert” was executed to deal with the resistors in the pad layout.

Close Window

 

 

Custom Design Layout - 8-Bit Full Adder

Abstract - In this project, an 8-bit full adder/subtractor circuit was built in Sue, laid out in Magic and simulated extensively with IRSIM. Gemini was also used to do an LVS (Layout vs. Schematic) before simulation. Results of the analysis of the final design done using HSPICE include the propagation rise and fall times and the critical path delay.

Main Components

The design of an 8-bit full adder/subtractor circuit was done using the following components: transmission gates, positive latch, register, xor gate, full adder and an and gate. All these are described in the following paragraphs.

Transmission Gate (TG): The structure of a CMOS transmission gate consists of an nFET Mn in parallel with a pFET Mp. The TG is designed to act as a voltage-controlled switch. The gates are controlled by the complimentary voltages Vg applied to the nFET, and (Vdd-Vg) applied to the pFET. When Vg is high, both Mn and Mp are biased into conduction and the switch is closed. If Vg is low, then both MOSFETs are in the cutoff and the switch is open. The TG is used to build the 2-1 multiplexor. Figure 1 shows a TG.


Latch: There are many approaches for constructing latches. One very common technique involves the use of transmission gate multiplexors. Figure 2 shows a multiplexor-based latch. In this latch the D input is selected when the clock is high, and the output is held when clock is low. When Clk is high, the bottom transmission gate is on and the D input is copied to the Q output. During this phase, the feedback loop is open since the top transmission gate is off.

Register: the master-slave configuration is commomly used to design an edge triggered register. Figure 3 shows the register using this configuration. On the low phase of the clock, the master stage is transparent and the D input is passed to the master stage output. During this period, the slave stage is in the hold mode, keeping its previous value using feedback. On the rising edge o fthe clock, the master stage stops sampling the input and the slave stage starts sampling. During the high phase of the clock, the slave stage samples the output of the master stage, while the master stage remains in the hold mode. The value of Q is the value of D right before the rising edge of the clock, achieving the positive edge-triggered effect.

Figure 1: Transmission Gate

Figure 2: Multiplexor based positive latch

Figure 3: Edge-triggered register using multiplexors

The top level block diagram of the circuit to be designed is shown in Figure 4.

The first step was to complete the design in SUE. The top level hierarchy was built in the following stages listed in the order of increasing level of hierarchy:
1. Multiplexor based positive latch
2. 1 bit register using the positive latch
3. Bitslice comprising of an XOR-FA-Register chain
4. 8 bitslices stacked together one on top of the other to form the 8 bit datapath
5. AND gate to provide the gated clock logic
6. XOR gate to provide the overflow logic

All the above mentioned components were simulated using IRSIM and corrected when required. The layout was done in a similar fashion. After the completion of every stage in Magic, an LVS comparison was done to validate the functionality of the layout. Since the schematics had been simulated successfully, any mismatches produced by Gemini indicated a problem in the layout.

Click here for the final schematic. Click here for the final layout.

Critical Path Delay obtained = (bInvert to sum7) + (register clock to R7) = 4.88 + 0.518 ns = 5.398 ns

 

Close Window

Figure 4. Top Level Block Diagram

 

4-Bit Datapath - Calculator

Team partner: Nikhil Jain

Stack Architecture is one of the microprocessor architectures that present its various complexities and details in the design process. Unlike Accumulator, Two or Three register architecture; Stack Architecture does not have any General Purpose Registers (GPR). This complicates the design process for a 4-bit central processing unit (CPU) or microprocessor. The size of the stack depends on the type of instructions used and data width. For example, a 4-bit CPU will require 8 nibbles of 4-bits each to store a complete word. The stack only allows the top most element to be accessed at any given clock cycle. Intel 4004 microprocessor was the first single chip 4-bit microprocessor built in 1971.

 

CPU Specifications

  • Datapath: 4 bits
  • RAM width: 4 bits
  • Stack width: 4 bits
  • Address Space: 4096 locations
  • Address Bus: 12 bits
  • No Exceptions or interrupts
  • Memory Mapped I/O
  • Little Endian Memory Organization
  • Reserved Memory Range: 0x FE2 – 0x FFF (for return address)
  • Variable Instruction Formats
  • Does not have any General Purpose Registers (GPR)
 

CPU Details

  • 4-bit Datapath means that all the data buses are 4 bit wide only. Therefore, the CPU (central processing unit) can handle operations on 4-bit ‘words’ only. Thus, memory and stack data width is also 4 bits.
  • However, Address Busses are 12-bits wide, giving an address space of 4096 nibbles of 4 bits each. Thus a maximum of 512 thirty-two-bit words can be stored.
  • As Interrupt handling requires extra hardware, we are not implementing that; it makes the CPU costly and doesn’t fulfill the goal of this project.
  • Memory Mapped I/O means that all the devices read/write data from/to a specific range of address.
  • ‘Little Endian’ memory organization is used for this architecture. It does not have any specific benefit as compared to the Big Endian memory model, it is just a matter of choice and the compiler has to be programmed accordingly. In Little Endian model, least significant byte has the lowest address.
  • As the CPU performs system calls, it has to save the return address from a sub-routine in memory. To make sure that the address is not overwritten by data, the memory is partitioned and only the return address can be written to it.
    Address range 0x FE2 – 0x FFF is reserved for this purpose.
  • The Program Counter (PC) and Memory Address Register (MAR) are 12 bits wide as they correspond to address bus and so they can issue addresses up to 4096.
  • Since the data bus is 4-bit wide, all the instructions are multiple of 4, which makes memory access easier and gives a variety of instruction lengths.

Instruction Set

The instruction set used in this project was based on 0-Adress Architecture or the Stack Architecture. Instructions form a set of tools that perform the following types of operations:

  • Data Movement Instructions
  • Arithmetic Instructions
  • Logical Instructions
  • Program Control Instructions

Close Window