cittadelmonte.info Environment Voice And Video Conferencing Fundamentals Pdf

VOICE AND VIDEO CONFERENCING FUNDAMENTALS PDF

Monday, September 9, 2019


Identifying Applications & Impacts of Videoconferencing . Voice Activated: Participants see the site that is currently speaking or last spoke on the . Voice and Video Conferencing Fundamentals - Kispernet video conferencing, voice and video streaming, and voice-over-IP security. who help. In Voice and Video Conferencing Fundamentals, three leading experts systematically introduce the principles, technologies, and protocols underlying today's.


Author:LENA GALLEGOS
Language:English, Spanish, Arabic
Country:Switzerland
Genre:Science & Research
Pages:266
Published (Last):06.10.2015
ISBN:694-9-34536-982-4
ePub File Size:23.63 MB
PDF File Size:9.49 MB
Distribution:Free* [*Regsitration Required]
Downloads:35031
Uploaded by: CHAN

Voice and Video Conferencing Fundamentals Scott Firestone, Thiya Ramalingam, and Steve FryCisco Press East 96th S. Voice and Video Conferencing Fundamentals Design, develop, select, deploy, and support advanced IP-based audio and video conferencing. Voice and video conferencing fundamentals pdf. VGN-TXN17P T, VGN-. TXN19P , VGN-TXN19P L Notebook Intel GM Chipset Driver 7. f7ecc f7eccd

Add To My Wish List. Register your product to gain access to bonus material or receive a coupon. The eBook requires no passwords or activation to read. We customize your eBook by discreetly watermarking it with your name, making it uniquely yours. Design, develop, select, deploy, and support advanced IP-based audio and video conferencing systems.

Cisco Press. This was last published in January Login Forgot your password? Forgot your password?

No problem! Submit your e-mail address below. We'll send you an email containing your password. Your password has been sent to: Please create a username to comment. Sensible contact center infrastructure updates for Contact center infrastructure evolves quickly and updates are inevitably necessary. Search Networking What you need to know about edge computing architecture Edge computing makes network operations more responsive, but there are challenges that must be addressed to get this latest What happens when edge computing hits enterprise networks?

Public cloud networking basics: Search SDN. Search Telecom. Search IT Channel Google cloud partners cite Anthos as key Next '19 technology Partners said the new Anthos technology is a game changer and gives Google a competitive edge over its public cloud rivals; other Topics covered in this book include video codecs, media control, SIP and H. Voice and Video Conferencing Fundamentals is for every professional involved with audio or video conferencing: Thiya Ramalingam is an engineering manager for the Cisco Unified Communications organization.

Steve Fry , a technical leader in the Cisco Unified Communication organization, has spent the last several years designing and developing telephony and conferencing products.

Books in this series introduce networking professionals to new networking technologies, covering network topologies, sample deployment concepts, protocols, and management techniques. Dealing with Lip Synchronization. Download - KB -- Chapter 7: Lip Synchronization in Video Conferencing. Escalation of Point-to-Point-to-Multipoint Call Common H. Call Transfer with the Empty Capability Set Asymmetric Encryption: Public Key Cryptography These components must establish signaling relationships to work together as a single system.

The distributed system appears to the end user as a single device, but in fact, it is a network of devices, each providing a specific service. The Session Initiation Protocol SIP is especially well suited to supporting such a distributed framework, so the next section describes one example of a distributed conferencing system built on top of SIP. This model consists of several components: The focus maintains a signaling relationship with all the endpoints or participants in the conference.

Each conference must have a unique address of record AoR that corresponds to a focus. A conference server could contain multiple focus instances, and each focus may control a single conference.

Each conference operates under the constraints described by the conference policy. The conference policy describes the operational characteristics of the conference instance. This governance controls all meeting services, including security aspects such as membership policy and media policy. Membership policy controls such attributes as which endpoints can join the conference, what capabilities they have, how long a meeting should last, and when a conference should remove a participant.

Media policy prescribes the range of stream characteristics for the various streams in the conference. These characteristics include allowable audio and video codecs, the minimum and maximum bandwidth, the maximum number of participants, and so on. Conference Policy Server The conference policy server is the repository for the various policies stored in the system. There is only one instance of the conference policy server within the system.

No standard protocol exists for communication between the focus and the policy server. If the conference policy allows it, the focus connects the participant to the conference. When a conference is to be terminated, the focus sends a BYE message to each endpoint. After all endpoints have been disconnected, the instance of the focus and the conference policy associated with the conference are destroyed. All the resources audio and video ports associated with that conference are freed.

The focus rejects attempts by endpoints to reconnect to the unique conference URI. Conferencing System Design and Architecture Media Server The media server establishes a signaling relationship with the focus on the control plane. It provides all the services of an audio mixer and video media processor MP. The media server terminates all media streams from the endpoints and returns the mixed audio and video streams to each device based on conference policy. Full-Mesh Networks Another option for decentralized conferencing is a full-mesh conference, shown in Figure This architecture has no centralized audio mixer or MP.

Instead, each endpoint contains an MP that performs media mixing, and all endpoints exchange media with all other endpoints in the conference, creating an N-by-N mesh. Endpoints with less-capable MPs provide less mixing functionality.

Because each device sends its media to every other device, each one establishes a one-to-one media connection with every other conferenced endpoint. However, endpoints may use different codecs for other pairwise connections.

Endpoints that send media with the same characteristics codec, frame rate to multiple endpoints may use IP multicasting; in general, however, such support is not widely deployed in corporate networks. Advanced Conferencing Scenarios 41 If no centralized signaling server is present, each endpoint must similarly establish a one-to-many signaling connection with all other endpoints in the conference. Endpoints may not use IP multicast for these signaling connections.

In the full-mesh conference topology, each device provides its own media processing, and therefore endpoints do not need to transrate or transcode video streams. In this case, the media processor in the conference server must reduce the quality of the single output video stream to the lowest common denominator of quality among the destination endpoints.

Advanced Conferencing Scenarios Modern conferencing system designs provide more features by integrating the conference control with other collaboration services. For example, a user can join a conference call with a single mouse click instead of dialing a number and going through an authenticating process. This section provides some examples of those advanced features. These scenarios assume that the endpoints have some basic capability such as support for call transfer. Escalation of Point-to-Point-to-Multipoint Call In this scenario, a point-to-point call between two participants becomes a conference call with more than two parties.

Participant A is in a point-to-point call with participant B and wants to invite a third participant, participant C. Participant A finds a conference server, sets up the conference, gets the URI or meeting ID, and transfers the point-to-point call to the conference server. Participant A then invites participant C into the conference call.

Participant A can add participant C using different methods, one of which is a dial-out process. In a dial out, the conference server sends the invite to the endpoint to join a conference. Lecture Mode Conferences A lecture mode conference has a lecturer who presents a topic, and the rest of the participants can ask questions.

There are two different styles of lecture mode meetings: If the administrator denies the request from an audience member to ask a question, the audio from that audience member is not mixed, even if that participant is the loudest speaker.

In this case, the focus instructs the mixer to exclude video from that participant in the mix. Conferencing System Design and Architecture In lecture mode video conferences, participants see the lecturer, and the lecturer sees the last participant who spoke. If none of the participants has spoken yet, the lecturer might see all the participants in a round-robin mode. In round-robin mode, the lecturer sees each participant for a few seconds.

Lecture-style meetings usually have data streams web conferencing associated with them. The participants can see the documents that the lecturer shares in a browser window. Panel Mode Conference A panel mode conference is a variation of the lecture mode conference. A panel mode conference has few panelists and more participants. This scenario is similar to having more than one lecturer in a lecture mode conference. Depending on the conference policy, end users can see one or more panelists in a continuous presence mode, in addition to seeing the participant who is speaking or asking a question.

Floor Control Floor control coordinates simultaneous access to the media resources in a conference. For instance, the meeting organizer or moderator can ensure that all participants hear only one participant. Or, the moderator can allow only certain participants to enter information into a shared document. End users can make floor control requests through a web interface or IVR. In addition, endpoints can provide access to floor control via floor control protocols.

Floor control protocols allow the endpoints and conference servers to initiate and exchange floor control commands. Video Mixing and Switching Scenarios When a user joins a video conference, the conference server offers the user one of a set of predefined video presentations.

The conference server describes each video presentation using a textual description and an image specifying how the presentation will appear on the screen. In this scenario, by choosing a video presentation, the user chooses how many video streams participants to view simultaneously and the layout of these video streams on the screen. Either conference policy or authorized participants may control the contents of each subwindow. Other aspects, such as the number of different mixes in the conference and the format of a custom mix for each user, are similar to audio mixing and use similar server capabilities and authorization methods.

References 43 The following is a list of typical video presentations; these are some of the common layouts available today in commercial products: The loudest speaker sees the last speaker.

If the last speaker has dropped out of the conference, the video mixer shows the previous last speaker. Summary This chapter provided an overview and comparison of several conferencing architectures and described the internal components that comprise these systems.

Voice and Video Conferencing Fundamentals

It also provided a detailed look at the theory of operation for an audio mixer and described the purpose and operations involved in video composition, transrating, and transcoding. The chapter closed with a review of the various types of meetings and video mixing scenarios. References Even, R. June Rosenberg, J. February Schulzrinne, H. May Casner, R. Frederick, and V.

Voice and Video Conferencing Fundamentals - PDF Free Download

July Most video conferencing endpoints negotiate a maximum channel bit rate before connecting a call, and the endpoints must limit the short-term one-way average bit rate to a level below this negotiated channel bit rate. A higher-efficiency codec can provide a higher-quality decoded video stream at the negotiated bit rate.

Quality can be directly measured in two ways: It is important to note that this perceptual quality incorporates the size of the image, the frame rate, and the spatial quality of the video. A more objective measure of codec performance is the peak signal-to-noise ratio PSNR. It is a measure of how much a decoded image with pixel values PO x,y deviates from the original 46 Chapter 3: Fundamentals of Video Compression image PI x,y.

Nonetheless, it is useful as a tool to compare different codecs. To minimize the end-to-end delay of the video streams, video codecs used for video conferencing must operate in a mode that supports low delay. As a result, these codecs might not be able to take advantage of extended features or special coding methods commonly used when compressing material for one-way viewing, such as the encoding used for DVDs.

Three of these features that are not available to video conferencing codecs include B-frames, multipass coding, and offline coding. B-frames allow a codec to compress a frame using information from a frame in the past and a frame in the future. To compress a B-frame, the encoder must first process the future referenced frame, which requires the encoder to delay the encoding of the B-frame by at least one frame.

Because of the one-frame delay, most codecs for video conferencing do not implement B-frames. Multipass coding is the process of encoding a stream multiple times in succession.

After performing the first pass, the encoder analyzes the result and determines how the encoding process can be altered to create the same quality encoded video at a lower bit rate. A multipass codec typically re-encodes a bitstream at least once, and possibly twice. Obviously, this sort of multipass processing is not possible with a codec used for video conferencing.

Voice and Video Conferencing Fundamentals

Offline coding is simply the process of encoding a video sequence in non-real time using computationally intensive offline hardware to achieve a lower bit rate with higher quality. Offline coding can provide a significant boost to codec efficiency, particularly for the more complex codecs such as H.

However, this method is not available for video conferencing endpoints. When evaluating a codec to use in a video conferencing product, it is important to observe the quality of a decoded bitstream that was encoded without any of the prior methods.

This format includes the codec algorithm and parameters of that algorithm, such as frame rate and bit rate. Profiles and Levels Codec specifications generally define a wide breadth of features that can be used to encode a video sequence.

Some of the more complex features might require additional resources, such as CPU power and memory. In addition, more CPU power is needed when decoding video with higher frame rates, image sizes, and bit rates.

Therefore, to facilitate decoders with fewer available resources, the codec specifications often define profiles and levels: Fewer features in the bitstream will reduce the resources needed on the decoder, and the decoder complexity.

As an example, some codec profiles prohibit B-frames, which normally require additional frame buffer memory and CPU processing. Frame Rates, Form Factors, and Layouts Two endpoints in a video conference negotiate a maximum video bit rate before connecting.

Video codecs can generate bitstreams ranging from 64 kbps to 8 Mbps and more. Higher bit rates consume more network bandwidth but provide greater video quality and frame rate. After the conference participants choose a video bandwidth, the endpoints choose a nominal frame rate, which is also negotiated between the two sides during call setup.

However, during the call, the actual frame rate might change over time, because the encoder must constantly trade off between bit rate, frame rate, and quality. When the video camera on an endpoint captures a high degree of motion, the encoder can maintain the same frame rate and quality by increasing the bit rate.

However, because the endpoints have predetermined the maximum allowable bit rate, the encoder must instead keep the bit rate constant and lower the frame rate or quality. For all CIF variations, each pixel has an aspect ratio width to height of The codec standards often refer to pixels as pels and may define a pel aspect ratio. Some high-end video conferencing systems, such as telepresence endpoints, support HD video cameras.

SD and HD differ in several aspects: SD typically has a 4: When specifying the resolution or frame rate of an HD camera, it is common to add a p or an i at the end of the specification to denote interlaced or progressive. Most often, the frame rate is left out of the notation, in which case it is assumed to be either 50 or Also, a description of an HD signal may specify a frame rate without a resolution.

For instance, 24p means 24 progressive frames per second, and 25i means 25 interlaced frames per second. Video Source Formats 49 Much like interlaced processing, support for the higher resolution of HD encoding is limited to certain codecs, and often to specific profiles and levels within each codec. Color Formats The color and brightness information for pixels can be represented in one of several data formats. The YCbCr format represents each pixel using the brightness value Y , along with color difference values Cb and Cr , which together define the saturation and hue color of the pixel.

The brightness values comprise the luminance channel, and the color difference values comprise the two chrominance channels. The chrominance channels are often referred to as chroma channels. The video codecs discussed in this chapter process images in the YCbCr color format and therefore rely on the video-capture hardware to provide frame buffers with YCbCr data. This process is called colorspace conversion.

Video encoders process data in YCbCr format because this format partitions the most important visual information in the Y channel, with less-important information in the Cb and Cr channels. The human visual system is more sensitive to degradation in the luminance channel Y and is less sensitive to degradation in the chrominance channels.

Therefore, the data pathways in the encoder can apply high compression to the Cr and Cb channels and still maintain good perceptual quality. Encoders apply lower levels of compression to the Y channel to preserve more visible detail. Codecs process YCbCr data that consists of 8 bits in each channel, but some codecs offer enhanced modes that support higher bit depths. The first operation of the encoder is to reduce the resolution of the Cr and Cb channels before encoding, a process known as chroma decimation.

Figure shows different formats for chroma decimation. The original, full-resolution frame of source video from the camera is represented in a format called 4: Each 4 represents a full-resolution channel, and 4: The codecs discussed in this chapter use a format known as 4: Studio processing demands the highest resolution for chroma channels. Chroma keying is a special effect that replaces a specific color in the video sequence with a different background video signal.

A typical chroma key video production places a green screen behind an actor and then later replaces the green color with a different background video. The chroma key replacement operation provides the best results if the chroma channels are available at the highest resolution possible, to perform the pixel-by-pixel replacement in areas with a highly complex pattern of foreground and background pixels, such as areas of fine wispy foreground hair.

Video Source Formats 51 To downsample from 4: Codecs for video conferencing use one of two variations of this interpolation, as shown in Figure Figure Chrominance Locations for 4: Fundamentals of Video Compression In the first format, called 4: In the second format, called 4: Table shows the 4: Table 4: Figure shows this variation.

For each individual field in an interlaced image, the encoder offsets the location of the chroma interpolation point up or down vertically, depending on whether the field is the top field or the bottom field. As a result, the chroma sampling positions are spatially uniform, both within each field and within the entire two-field frame. Basics of Video Coding Video coding involves four major steps: At the heart of the encoder is a feedback loop that predicts the next frame of video and then transmits the difference between this prediction and the actual frame.

Because the encoder uses a recently decoded frame to generate a prediction, the encoder has a decoder embedded within the feedback loop.

Preprocessing Before an image is handed to the encoder for compression, most video conference endpoints apply a preprocessor to reduce video noise and to remove information that goes undetected by the human visual system. Noise consists of high-frequency spatial information, which can significantly increase the pixel data content, and therefore increase the number of bits needed to represent the image.

One of the simpler methods of noise reduction uses an infinite impulse response IIR temporal filter, as shown in Figure Fundamentals of Video Compression In this scenario, the preprocessor creates a new output frame by adding together two weighted frames. The first weighted frame is the current input frame multiplied by 0.

The second weighted frame is the previous output of the preprocessor multiplied by 0. This process effectively blurs the image slightly in the temporal direction, reducing background noise. When participants use a noisy video source, such as a webcam, the endpoints can apply a stronger temporal filter by increasing the percentage of the previous frame used in each iteration. A second function of a preprocessor is to remove information that is generally not perceived by the human visual system.

As a result, the encoding algorithm produces a smaller bitstream with less information, but without loss of detail. Preprocessing operations often take advantage of the fact that the human visual system perceives less spatial resolution in areas of the image that contain a high degree of motion.

The preprocessor performs this operation on a pixel-by-pixel basis by calculating the difference in value between a pixel in the current frame and the corresponding pixel in the previous frame. If this difference is greater than a threshold value, this pixel is deemed to be in an area of high motion, and the preprocessor can apply blurring to that pixel in the current frame, typically using a spatial low-pass filter.

Even though preprocessing is almost always used by video conferencing endpoints, the decoder is unaware of the process. Because codec specifications describe only how a decoder interprets the bitstream, preprocessing is not within the scope of the standards and therefore is never mentioned in codec specifications.

However, to achieve high quality, endpoints generally must implement one or more of these preprocessing steps.

The decoding process is lossy, meaning that the decoded image deviates slightly from the original image. After the decoding process, the resulting pixel values deviate from the original pixel values somewhat smoothly within each block. However, the pixel deviations might not match up at the boundaries between two adjacent blocks. Such a mismatch in pixel deviations at a boundary causes a visible discontinuity between adjacent blocks, a phenomenon known as block artifacting. To combat block artifacts, decoders can implement deblocking filters, which detect these block border discontinuities and then modify the border pixels to reduce the perceptual impact of the block artifacts.

Deblocking filters can range in complexity from simple to extremely complicated: If the difference is above a preset threshold, the post-processor can apply a blurring operation to the pixels on each side of the border. In addition, if the deviations at the boundary are great, the blurring filter can modify pixels at the border and pixels one position farther away from the border pixels. Because the encoder and decoder must remain in lockstep, they must each use an identical reference frame with identical post-processing.

Encoder Overview Video codecs may apply intracoding or intercoding. An intraframe, also called an I-frame, is a frame that is coded using only information from the current frame; an intracoded frame does not depend on data from other frames. In contrast, an interframe may depend on information from other frames in the video sequence.

Figure shows an overview of intra-image encoding and decoding. The intra coding model consists of three main processes applied to each frame: Figure also shows the corresponding decoder. The decoder provides the corresponding inverse processes to undo the steps in the encoder to recover the original video frame. The transform process in Figure converts a video image from the original spatial domain into the frequency domain. The frequency domain representation expresses the image in terms of the two-dimensional 56 Chapter 3: Fundamentals of Video Compression frequencies present in the original image.

Table lists the transform algorithms used by various codecs. The output of each transform is an array of the same size as the input block. There are two types of transforms: The DCT and integer transforms differ mathematically, but they provide the similar function of decomposing the spatial domain into the frequency domain; this direction is referred to as the forward DCT, or FDCT.

In Figure , therefore, the outputs of the DCT are shown normalized so that the lowest value is shown as black and the highest value as white. Image patterns 4, 5, and 6 contain less frequency information, because those patterns consist of a single edge. Each position in the transform output array actually corresponds to a pattern of pixels.

Each of these patterns is called a basis function. Therefore, the values in the output array of the transform are referred to as coefficients. Figure 16 Basis Functions of the H. In Figure , all the basis functions have been normalized so that the lowestvalued pixel in each basis function displays as black, and the highest-valued pixel in each basis function displays as white.

Fundamentals of Video Compression The coefficients correspond to frequency patterns as follows: In addition, the coefficient at the upper-left corner is referred to as the DC coefficient because it represents the amount of zero-frequency information in the block. This zerofrequency information is just a representation of the average value of all pixels in the block. The notation DC refers loosely to the concept of direct current, which yields a constant voltage.

The remaining coefficients are called AC coefficients because they correspond to varying frequency patterns. The notation AC refers loosely to the concept of alternating current, which yields a constantly changing voltage. However, the H. The transformation from spatial domain to frequency domain facilitates image compression in two ways: The reason is because typical images consist of mainly low-frequency information, which can be represented with a small number of values from the upper left of the DCT output array.

Typical images have little or no high-frequency information, which means that the output of the transform will have values near the lower-right corner that are either small or zero. All codecs in this chapter use this feature. Therefore, the encoder can reduce the precision of coefficients representing the high-frequency information without severely affecting the perceived quality of the encoded video.

As a result, all codecs represent the lowest-frequency coefficient the DC coefficient with a high degree of precision. In addition, the H. Because the codecs use 4: Quantization The processing unit in Figure that performs the quantization step is the quantizer.

Quantization is the process of reducing the precision of the frequency domain coefficients. In the simplest form, the encoder quantizes each coefficient by simply dividing it by a fixed value and then rounding the result to the nearest integer. For instance, the H. By reducing the precision of coefficients, less information is needed to represent the frequency domain values, and therefore the bit rate of the encoded stream is lower.

However, because the quantization process removes precision, some information from the original image is lost. Therefore, this process reduces the quality of the encoded image. As a result, codec schemes that use quantization are considered lossy codecs, because the quantization process removes information that cannot be recovered.

Fundamentals of Video Compression Quantization is performed using an input-output transfer function. Figure shows an example. The transfer function is always a stairstep, and the fewer the steps, the coarser the quantization.

The range of each step on the input x axis is called the quantization step size. In Figure , the quantization step size is 12, which means that each step maps 12 different input values to the same output value. The output values are integer indexes, known as quantization levels. In the intraframe pipeline, the quantizer operates on output transform coefficients, which may consist of signed numbers.

The one exception is the scenario in which the DCT operates on original pixel values; in this case, the DC coefficient represents the average value of all pixels in the original image block and therefore is always positive. However, the quantization transfer function must accommodate both positive and negative values of DCT coefficients.

One characteristic of the DCT is that most codecs define the precision of the coefficient values to be 4 bits more than the precision of the input values. The DCT output values have 12 bits of precision, corresponding to values in the range [—, ].

Because the raw coefficients from the transform function have higher precision than the original image pixels, the quantizer must accommodate this wider range of input values. Basics of Video Coding 61 Transfer functions generally apply clipping, a process that limits the output of the function to a maximum value.

In Figure , the quantization process clips input values greater than or equal to to an index of 8. The transfer function might or might not apply a dead zone. Figure shows a transfer function with a dead zone, which clamps input values in the vicinity of 0 to 0.

This dead zone attempts to eliminate low-level background noise; if a coefficient is close to 0, it is assumed to be background noise and gets clamped to 0. In some cases, the transfer function can have nonuniform step sizes, as shown in Figure Figure Quantizer with a Nonuniform Step Size 8 7 6 5 4 3 2 1 — — —81 —60 —42 —27 —15 —6 —2 —3 —4 —5 —6 —7 —8 Quantizer Step Size Varies 0 6 15 27 42 60 81 In this approach, the degree of coarseness is proportional to the magnitude of the input value.

The principle is that larger input values may be able to suffer a proportionately higher amount of quantization without causing an increase in relative distortion. None of the codecs in this chapter uses nonuniform step sizes; however, the G. Quantization of the transform coefficients may consist of two methods: Typically, the matrix quantization process applies a larger step size to higher-frequency coefficients located near the lower right of the transform, because the 62 Chapter 3: Fundamentals of Video Compression human visual system is less sensitive to these frequency patterns.

Codecs that use matrix quantization generally assign a single quantization level to a block and then use a matrix of numbers to scale the quantization level to the final step size used for each coefficient.

Figure Quantization Matrix for Interblocks, Defined in MPEG-4 Part 2 8 17 18 19 21 23 25 27 17 18 19 21 23 25 27 28 20 21 22 23 24 26 28 30 21 22 23 24 26 28 30 32 22 23 24 26 28 30 32 35 23 24 26 28 30 32 35 38 25 26 28 30 32 35 38 41 27 28 30 32 35 38 41 45 In most codecs, the bitstream does not specify a step size directly; instead, the bitstream contains a quantization value, often denoted by the variable Q.

The encoder and decoder then use this Q value to derive the final quantization step size. A high Q value results in a larger step size and more compression.

Entropy Coding The final stage of the generalized encoder is entropy coding, as shown in Figure Entropy coding is a lossless codec scheme that seeks to reduce the bit rate by eliminating redundancy in the bitstream. Entropy coding generally operates on a string of one-dimensional data, which means that each two-dimensional quantized coefficient array must be converted into a onedimensional string.

The entropy of a bitstream is defined as the lowest theoretical average number of bits per symbol needed to represent the information in the bitstream.

It also corresponds to the theoretical minimum number of bits per symbol that an ideal entropy coder can achieve. If the bitstream contains n symbols, and the probability of each symbol is P n , the entropy of the bitstream is calculated using the Shannon entropy formula: Sequences that have lower entropy can be coded using fewer bits per input value.

Sequences that have lower entropy are those with a more highly skewed probability distribution, with some input values occurring much more frequently than other values. Such is the case for DTC coefficients, which have probability distributions skewed toward lower values. Entropy coding generally falls into three categories: Run-Length Coding The simplest form of entropy coding is run-length coding, which achieves compression for streams containing a pattern in which a value in the stream is often repeated several times in a row.

When a single value is repeated, it is more efficient for the encoder to specify the value of the repeated number and then specify the number of times the value repeats.

A 1-D sequence of quantized DCT coefficients often contains long runs of zeros for the high-frequency coefficients, allowing a run-length coder to achieve a high degree of lossless compression. When the run-length coder specifically codes the number of zeros between nonzero values, this coding scheme is often called a zero-run-length coder. The decoder expands each run and length pair from the encoder into the original uncompressed string of values.

VLC lowers the number of bits needed to code a sequence of numbers if the sequence of numbers has a nonuniform statistical distribution. Figure shows a sample statistical distribution for the magnitudes of AC coefficients. Fundamentals of Video Compression Figure shows only the magnitude of the AC coefficient values, because most codecs encode the sign of each AC coefficient separately from the magnitude. The statistical probability distribution is highly skewed, with a much higher probability of encountering lower-valued coefficients.

For these data profiles with skewed probabilities, VLC attempts to represent the high-probability values with shorter bit sequences and the lower-probability values with longer bit sequences. Table shows one possible VLC table that can be used in H. It is a standard table called the Exp-Golomb-coded syntax. After the encoder constructs the code table, it uses the table to look up the variable-length bitstream of each input value.

Instead of using a mathematical algorithm, this process uses a mapping method to map a set of input values into a set of variable-length strings. Therefore, the values in the input set are often referred to as indexes, or symbols, and many of the codec specifications refer to input symbols, rather than input values.

Codecs also refer to the VLC table as the symbol code table. Because the 0 value has the highest probability, it is represented using a single bit, with a bit string of 1.

The input symbol 1 is represented using 3 bits, with a bit string of The idea is to use a VLC table that minimizes the average number of bits per symbol, averaged over the entire bitstream, which is calculated using the following formula: The VLC process consists of two phases: Most codecs use precalculated, fixed VLC tables, which are based on typical probability distributions encountered in DCT coefficients of natural images.

To work, the VLC table must exhibit one property: No VLC entry in the code table is permitted to be a prefix of any other entry. An input stream with a more highly skewed probability distribution can take advantage of a VLC table that results in a VLC output stream with a lower number of average bits per pixel.

KATELYN from Massachusetts
I enjoy exploring ePub and PDF books wholly. Feel free to read my other posts. I take pleasure in underwater orienteering.