liminfo

Apache Avro Reference

Free reference guide: Apache Avro Reference

18 results

About Apache Avro Reference

The Apache Avro Schema Reference is a structured, searchable guide to the complete Avro data serialization system used throughout the Apache Kafka and Hadoop ecosystems. Avro uses JSON-based schemas to define data structures, making it a popular choice for high-throughput event streaming pipelines. This reference covers all primitive types — null, boolean, int, long, float, double, bytes, and string — as well as every complex schema type including record, enum, array, map, union, and fixed, each with accurate JSON schema examples you can copy directly into your project.

Data engineers, backend developers, and architects working with Kafka Schema Registry or Apache Spark will find the schema evolution section especially valuable. This reference clearly explains the three compatibility modes: Forward Compatibility (new schema, old reader), Backward Compatibility (old schema, new reader), and Full Compatibility (both directions). Understanding Schema Resolution rules — how field matching works between writer and reader schemas — is essential for maintaining reliable data pipelines, and this reference provides concise, practical examples for each scenario.

The logical types section covers Avro's typed overlays on top of primitive types: the date logical type (int storing days since epoch), timestamp-millis (long storing milliseconds since epoch), and decimal (bytes with precision and scale). Organized into four categories — Types, Schema, Evolution, and Logical Types — this reference is designed for quick lookup during development, code review, or when designing Kafka topic schemas with a Schema Registry.

Key Features

  • All 6 Avro primitive types with JSON schema syntax and examples (null, boolean, int/long, float/double, bytes/string)
  • All 6 complex schema types: record, enum, array, map, union, and fixed with realistic code examples
  • Schema evolution section covering Forward, Backward, and Full Compatibility modes with practical rules
  • Schema Resolution rules explaining how writer/reader field matching works in Avro
  • Logical types reference: date, timestamp-millis, and decimal with precision/scale usage
  • Searchable and category-filterable interface for fast lookup during development
  • Syntax-highlighted JSON code examples ready to copy into schema definitions
  • Completely free, no account required, works offline after first load

Frequently Asked Questions

What is Apache Avro and when should I use it?

Apache Avro is a data serialization framework that uses JSON-defined schemas to encode data in a compact binary format. It is widely used in Apache Kafka pipelines with Schema Registry, Hadoop-based data lakes, and any system where compact, schema-validated serialization is important. Avro is especially well-suited when you need schema evolution — the ability to change data structures over time without breaking existing consumers.

What is the difference between Avro record, map, and struct types?

A record is Avro's equivalent of a struct or object — it has named fields each with their own type. A map is a key-value store where keys are always strings and values share one type (e.g., "type":"map","values":"int"). Avro does not have a separate struct keyword; record is used for all structured objects.

How does Avro union type work for nullable fields?

In Avro, a union is written as a JSON array of types, such as ["null","string"]. This means a field can hold either null or a string. The convention is to put "null" first in the union array when null is the default value, as the default must match the type of the first union member. Nullable fields look like: {"name":"email","type":["null","string"],"default":null}.

What is the difference between Forward and Backward Compatibility in Avro?

Backward Compatibility means new code (new schema) can read data written by old code (old schema) — you can add fields with defaults or remove fields. Forward Compatibility means old code can read data written by new code — you can add fields with defaults (old reader ignores unknown fields) or remove fields (old reader uses its own defaults). Full Compatibility requires both, meaning all field additions and removals must have default values.

How do Avro logical types work?

Avro logical types annotate a primitive type with a logicalType property to give it semantic meaning. For example, {"type":"int","logicalType":"date"} stores a calendar date as days since the Unix epoch (1970-01-01). The decimal logical type adds precision and scale to bytes or fixed types. Avro libraries decode these logical types into native language types like LocalDate or BigDecimal.

What is Schema Resolution in Avro?

Schema Resolution is how Avro handles reading data when the writer schema and reader schema are different. Fields are matched by name (not position). If the writer has a field the reader does not, it is ignored. If the reader has a field the writer does not, the reader uses the field's default value. If a required field has no default and is missing in the writer schema, deserialization will fail.

What is a Kafka Schema Registry and how does it use Avro?

Confluent Schema Registry is a central storage for Avro (and Protobuf/JSON Schema) schemas used in Kafka topics. Producers register schemas before publishing messages; consumers fetch schemas by ID to deserialize messages. The Registry enforces compatibility rules (backward, forward, full) to prevent incompatible schema changes from breaking consumers. Each Avro message in Kafka is prefixed with a 5-byte magic byte + schema ID.

How do I define a record with default values in Avro?

In an Avro record schema, each field can have a "default" property. For primitive types, the default must match the type (e.g., "default":0 for int). For union types, the default must match the first type in the union array (so put "null" first for nullable fields with null defaults). Example: {"name":"score","type":["null","int"],"default":null} or {"name":"count","type":"int","default":0}.