This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Hitchhiker's Guides

TODO add content

1 - Hitchhiker's Guide To Projection Selection

Introduction

Projections, the derived views of our event-sourced data, serve as vital components in shaping our applications, enabling efficient querying, analysis, and decision-making. However, with Factus offering a range of projection options, each with its own strengths and considerations, it becomes essential to choose wisely.

Our objective is to equip you with the knowledge and insights necessary to navigate the available options and make the right choices that align with your project’s requirements. We will delve into the intricacies of each projection type, uncover their unique features and trade-offs, and provide practical advice to aid your decision-making process.


Identifying Relevant Requirements

Before diving into the exploration of different projection types, it is essential to establish a clear understanding of the requirements that are relevant to your specific project. By identifying these requirements upfront, you can effectively narrow down the options and choose the projection type that best aligns with your project goals and constraints.

In this section, we will delve into a comprehensive list of possible requirements that should be considered when evaluating projection types. By examining and prioritizing these requirements, you will gain valuable insights into the trade-offs and considerations associated with each projection type.

Scalability

This requirement focuses on the ability of the chosen projection type to handle growing amounts of data and increasing workloads without compromising performance or functionality. Considerations include the horizontal scalability of the projection, the efficiency of data distribution, and the ability to handle concurrent updates and queries.

Performance (split between latencies and costs?)

Performance refers to the speed and responsiveness of the projection type in processing events and serving queries. It involves evaluating factors such as event ingestion rates, query response times, and the impact of increasing data volumes on overall system performance. Choosing a projection type that can meet the desired performance benchmarks is crucial for maintaining a high-performing and responsive system.

Query flexibility

Query flexibility assesses the ability to express complex queries and retrieve relevant information efficiently. It involves evaluating the projection type’s support for various query patterns, such as filtering, aggregations, joins, and ad-hoc queries. Consider whether the chosen projection type enables the desired flexibility in querying the event-sourced data while maintaining good performance.

Complexity

Complexity refers to the level of intricacy and sophistication involved in implementing and managing the chosen projection type. It encompasses aspects such as the learning curve for developers, the architectural complexity of the projection, and the degree of operational complexity. It is important to assess whether the complexity aligns with the team’s expertise and resources.

Data consistency

Data consistency focuses on ensuring that the derived views produced by the projection type accurately reflect the state of the event stream. It involves assessing how well the projection type handles events, updates, and concurrent modifications to maintain a consistent and coherent view of the data across different projections. Ensuring data consistency is crucial for making reliable and accurate decisions based on the derived views.

Maintainability

Maintainability assesses the ease of managing, updating, and evolving the projection type over time. Considerations include the ability to accommodate changing business requirements, the ease of making modifications or adding new features, and the availability of monitoring and debugging tools. Choosing a projection type that is maintainable ensures long-term sustainability and adaptability of the system.


Comparing Projection Types

Let’s go over a comparative analysis of the different projection types, discussing the strengths and weaknesses of each type concerning the identified requirements.

Snapshot Projection

Snapshot Projection documentation

  • Scalability: by default, a Snapshot Projection stores its cached state in the FactCast server (aka the Event Store). This can create bottlenecks, or impact the performance of the Event Store, whenever the workload increases. It can be optimized, depending on the use case, by hooking into the snapshot lifecycle and changing the way it is accessed, serialized, stored, and retained. Alternatively, the factcast-snapshotcache-redisson module can be used, to easily store the snapshots in a Redis cluster (see Best Practices and Tips section below).

  • Performance: whenever the projection is fetched for new events, the most recent snapshot is transferred, de-serialized, updated, then re-serialized, and transferred back to the Event Store. Performance might decrease with the increase of the snapshots size, and/or the frequency of the queries.

  • Query flexibility: a Snapshot Projection allows to query, and aggregate, multiple event types on-demand. Any type of data structure can be used to store the projected events, considering that it needs to be serializable.

  • Complexity: the complexity of this projection varies, as it’s fairly easy to use, with the default snapshot lifecycle implementation. It can get more complex, whenever it’s necessary to customize its aspects.

  • Data consistency: when fetched, the Snapshot Projection ensures to return the most recent representation of the event stream. It supports optimistic locking to handle concurrent modifications. Check out the Optimistic Locking section for further details.

  • Maintainability: the projection allows to change the business logic of the event handlers, to create new data structures for the derived views, update existing ones, or to add new decisions. To do so, it is necessary to update the serial of the projection every time the projection class is changed - check out the documentation. The full event-stream will then be re-consumed on the subsequent fetch, as the previously created snapshots will get invalidated. The snapshots retention might need to be fine-tuned on the long run, based on the amount of resources available, and the frequency of the queries.

Aggregate

Aggregate documentation

  • Scalability: same considerations made for the Snapshot Projection apply.

  • Performance: same considerations made for the Snapshot Projection apply.

  • Query flexibility: depending on the events schema design, an Aggregate Projection offers limited flexibility, compared to a snapshot projection, as it allows to build views that are specific to a single entity (or aggregate, hence the name of this projection). This doesn’t restrict to perform multiple queries for different aggregate ids, to create relations between them. It could be argued that alternative projection types may be better suited for these types of use cases, thereby reducing the number of requests sent to the server.

  • Complexity: same considerations made for the Snapshot Projection, but conceptually speaking, this is the easiest projection to use, as it should just represent the state of a single entity.

  • Data consistency: same considerations made for the Snapshot Projection apply.

  • Maintainability: same considerations made for the Snapshot Projection apply.

Managed Projection

Managed Projection documentation

Preface: considering a Managed Projection that has its state externalized in a shared database.

  • Scalability: a Managed Projection enables the application to effectively handle the lifecycle of the views, allowing to adapt it to the expected workload. With a shared database, the derived views are uniformly accessible and consistent among multiple instances of the projection.

  • Performance: whenever a projection is updated, the most recent events since the last update are fetched from the Event Store, and processed. The performance of the projection depends on the frequency of the updates, the amount of events that need to be processed, and of course, the complexity of the business logic to manage the derived views.

  • Query flexibility: a Managed Projection allows to query, and aggregate, multiple event types on-demand. Any type of external datasource can be potentially used to store the derived views. Since Factus has no control over the Projection, the projection implementation itself needs to ensure that proper concurrency handling is implemented, whenever the underlying datasource doesn’t support it.

  • Complexity: this projection requires to implement the business logic to manage the derived views, and to handle the concurrency (if needed). It might also be necessary to design the lifecycle of the projection, to ensure that it is updated at the desired frequency (e.g. using scheduled updates).

  • Data consistency: since a shared datasource is used to store the derived views, the same state is shared across all instances of the projection. Of course, the derived views might be stale, whenever new events are published, but the projection can be updated while processing queries, to ensure that the most recent state is returned. It supports optimistic locking to handle concurrent modifications. Check out the Optimistic Locking section for further details.

  • Maintainability: a managed projection enables the construction of derived views that can potentially be queried even when the Event Store is unavailable. The projection allows to change the business logic of the event handlers, and change the underlying structure the derived views. To do so, it is necessary to update the serial of the projection every time the projection class is changed - check out the documentation. The full event-stream will then be re-consumed on the subsequent updates, to rebuild the derived views.

Local Managed Projection

Local Managed Projection documentation

  • Scalability: a Local Managed Projection stores its state in-memory. Depending on the use-case, this can create performance, and availability issues on the long-run, whenever the derived views size increases over time or is affected by peaks. Remember that, during horizontal scaling, each instance will maintain its independent state, potentially resulting in data inconsistencies.

  • Performance: same considerations made for the Managed Projection. Arguably, the performance of a Local Managed Projection is better, as it doesn’t need to access an external datasource to store the derived views. However, it needs to be considered that the derived views are stored in-memory, and that the memory footprint of the projection will increase over time, potentially affecting the performance of the application.

  • Query flexibility: a Local Managed Projection offers the highest degree of freedom, in terms of flexibility, as it enable to manage the in-memory views using whatever data structure offered by the programming language.

  • Complexity: this projection only requires to implement the business logic to manage the derived views. For this reason, it is probably the easiest projection to start with, especially for a proof of concept, or a prototype.

  • Data consistency: since the derived views are stored in-memory, the same state won’t be shared across multiple instances. In terms of staleness, the same considerations made for the Managed Projection apply.

  • Maintainability: a Local Managed Projection is the easiest projection to maintain, as it doesn’t require to manage external datasources. Everytime the application is stopped, the derived views are lost, and need to be rebuilt on subsequent restarts: this allows to easily test the projection, and change its business logic, but also has an impact on the performances, as the derived views need to be rebuilt from scratch.

Subscribed Projection

Subscribed Projection documentation

Preface: considering a Subscribed Projection that has its state externalized in a shared database.

  • Scalability: only one instance will actually subscribe to the event stream, and receive events asynchronously. This implies that horizontal scaling could be limited, as only one instance will be able to execute the handlers business logic. However, with a shared database, the derived views are uniformly accessible and consistent among multiple instances of the projection, enabling to spread the load of the queries.

  • Performance: after catching-up, the projection consumes events right after those are published, with a small latency (expected to be below 100ms). The projection performance only depends on the complexity of the business logic, and the underlying datasource used to store the derived views.

  • Query flexibility: a Managed Projection allows to query, and aggregate, multiple event types on-demand. Any type of external datasource can be potentially used to store the derived views. Since Factus has no control over the Projection, the projection implementation itself needs to ensure that proper concurrency handling is implemented, whenever the underlying datasource doesn’t support it. Since the derived views are updated asynchronously, it is possible to query only the most recent state of the derived views, and not the state at the time of the query.

  • Complexity: this projection requires to implement the business logic to manage the derived views, and to handle the concurrency (if needed). The projection update is handled asynchronously by Factus, reducing the complexity of the application.

  • Data consistency: since a shared datasource is used to store the derived views, the same state is shared across all instances of the projection. Since the application is not responsible for the projection update, it never knows the current projection state, which is then eventually consistent. This might be confusing, especially during a read-after-write scenario, where the user expects to see the result of the update immediately after the command is executed.

  • Maintainability: in terms of maintainability, a Subscribed Projection is similar to a Managed Projection, as it allows to change the business logic of the event handlers, and change the underlying structure the derived views. To do so, it is necessary to update the serial of the projection every time the projection class is changed - check out the documentation. The full event-stream will then be re-consumed on the next catch-up phase (when the new projection starts), to rebuild the derived views.

Local Subscribed Projection

Local Subscribed Projection documentation

  • Scalability: same considerations made for the Local Managed Projection apply.

  • Performance: same considerations made for the Local Managed Projection apply.

  • Query flexibility: same considerations made for the Local Managed Projection apply.

  • Complexity: same considerations made for the Local Managed Projection apply.

  • Data consistency: same considerations made for the Local Managed Projection apply, with the difference that since the application is not responsible for the projection update, it never knows the current projection state, which is then eventually consistent.

  • Maintainability: same considerations made for the Local Managed Projection apply.


Selecting the Right Projection Type

When embarking on the journey of selecting the right projection type for your event sourcing project, it is crucial to carefully evaluate and prioritize the identified requirements based on your project’s unique context.

That being said, here are some general Q&As to help you make an informed choice:

Q: Are you still modelling for a prototype?

A: If yes, then you can start with a Local, as it’s the easiest and most intuitive projection to implement and quick to change an rebuild.

Q: Do you need to easily rebuild your projection?

A: If yes, then consider using a Local Projection, as it allows to rebuild the in-memory derived views from scratch by simply restarting the application. Consider anyway that this might have an impact on the overall application performance, as this will produce an overhead on each deployment.

Q: Do you need to ensure high availability for queries of a usecase, even when the Event Store is unavailable?

A: If yes, then make sure to go for a projection that doesn’t rely on the Event Store for persisting its state. Consider the trade-offs between local and externalized states:

  • Local states are faster to query, easier to implement and to maintain, but they need to be rebuilt from scratch on every restart
  • Externalized states are harder to implement and maintain, but they can be rebuilt incrementally, and are available across multiple instances

Q: Does your query need to ensure read-after-write consistency?

A: If yes, then it’s suggested to choose a projection that can be updated synchronously, like a SnapshotProjection, an Aggregate or a ManagedProjection. Depending on the amount of data to be read, and the persistence layer, this might have a different impact on the application performance.

Q: Should the projected data be available for external services?

A: If yes, then opt for a projection that offers freedom in terms of persistence, like a ManagedProjection or a SubscribedProjection. This will allow to store the derived views in an external datasource, and to query them using whatever technology is available.

Q: Does a specific query need a single entity or object?

A: If yes, then you can opt to use a dedicated Aggregate for the query. Generally speaking, Aggregates are usually fast, easier to implement and to maintain, but they might be not suitable for very complex queries that require to aggregate multiple event types. You can still use different projection types for different queries, and combine them together in your application.


Best Practices and Tips

Check the following guide regularly, as it is updated with tips to improve performance, or fix common issues.

When using Snapshot Projections, consider using the factcast-snapshotcache-redisson module, to store the snapshots in a Redis cluster, instead of the Event Store. This will reduce the load on the Event Store, and will allow to scale the snapshots cache independently.

2 - Hitchhiker's Guide To Testing

Introduction

An event-sourced application usually performs two kinds of interactions with the FactCast server:

  • It subscribes to facts and builds up use-case specific views of the received data. These use-case specific views are called projections.
  • It publishes new facts to the event log.

Building up projections works on both APIs, low-level and Factus. However, to simplify development, the high-level Factus API has explicit support for this concept.

Unit Tests

Projections are best tested in isolation, ideally at the unit test level. In the end, they are classes receiving facts and updating some internal state. However, as soon as the projection’s state is externalized (e.g. see here) this test approach can get challenging.

Integration Tests

Integration tests check the interaction of more than one component. Here, we’re looking at integration tests that validate the correct behaviour of a projection that uses an external data store like a Postgres database.

Be aware that FactCast integration tests as shown below can start up real infrastructure via Docker. For this reason, they usually perform significantly slower than unit tests.


Testing FactCast (low-level)

This section introduces the UserEmails projection for which we will write

  • unit tests and
  • integration tests.

For interaction with FactCast we are using the low-level API.

The User Emails Projection

Imagine our application needs a set of user emails currently in use in the system. To provide this information, we identified these facts which contain the relevant data:

  • UserAdded
  • UserRemoved

The user UserAdded fact contains a user ID and the email address. UserRemoved only carries the user ID to remove.

Here is a possible projection using the FactCast low-level API:

@Slf4j
public class UserEmailsProjection {

    private final Map<UUID, String> userEmails = new HashMap<>();

    @NonNull
    public Set<String> getUserEmails() {
        return new HashSet<>(userEmails.values());
    }

    public void apply(Fact fact) {
        switch (fact.type()) {
            case "UserAdded":
                handleUserAdded(fact);
                break;
            case "UserRemoved":
                handleUserRemoved(fact);
                break;
            default:
                log.error("Fact type {} not supported", fact.type());
                break;
        }
    }

    @VisibleForTesting
    void handleUserAdded(Fact fact) {
        JsonNode payload = parsePayload(fact);
        userEmails.put(extractIdFrom(payload), extractEmailFrom(payload));
    }

    @VisibleForTesting
    void handleUserRemoved(Fact fact) {
        JsonNode payload = parsePayload(fact);
        userEmails.remove(extractIdFrom(payload));
    }

    // helper methods:

    @SneakyThrows
    private JsonNode parsePayload(Fact fact) {
        return FactCastJson.readTree(fact.jsonPayload());
    }

    private UUID extractIdFrom(JsonNode payload) {
        return UUID.fromString(payload.get("id").asText());
    }

    private String extractEmailFrom(JsonNode payload) {
        return payload.get("email").asText();
    }
}

The method apply acts as an entry point for the projection and dispatches the received Fact to the appropriate handling behavior. There, the Fact object’s JSON payload is parsed using the Jackson library and the projection’s data (the userEmails map), is updated accordingly.

Note, that we chose to avoid using a raw ObjectMapper here, but instead use the helper class FactCastJson as it contains a pre-configured ObjectMapper.

To query the projection for the user emails, the getUserEmails() method returns the values of our internal userEmails map’s values copied to a new Set.

Unit Tests

Unit testing this projection is very easier, as there are no external dependencies. We use Fact objects as input and check the customized view of the internal map.

Let’s look at an example for the UserAdded fact:

@Test
void whenHandlingUserAddedFactEmailIsAdded() {
    // arrange
    String jsonPayload = String.format(
        "{\"id\":\"%s\", \"email\": \"%s\"}",
        UUID.randomUUID(),
        "user@bar.com");
    Fact userAdded = Fact.builder()
        .id(UUID.randomUUID())
        .ns("user")
        .type("UserAdded")
        .version(1)
        .build(jsonPayload);

    // act
    uut.handleUserAdded(userAdded);

    // assert
    Set<String> emails = uut.getUserEmails();
    assertThat(emails).hasSize(1).containsExactly("user@bar.com");
}

Note the use of the convenient builder the Fact class is providing.

Since the focus of this unit test is on handleUserAdded, we execute the method directly. The full unit test also contains a test for the dispatching logic of the apply method, as well as a similar test for the handleUserRemoved method.

Checking your projection’s logic should preferably be done with unit tests in the first place, even though you might also want to add an integration test to prove it to work in conjunction with its collaborators.

Integration Tests

FactCast provides a Junit5 extension which starts a FactCast server pre-configured for testing plus its Postgres database via the excellent testcontainers library and resets their state between test executions.

Preparation

Before writing your first integration test

  • make sure Docker is installed and running on your machine
  • add the factcast-test module to your pom.xml:

<dependency>
    <groupId>org.factcast</groupId>
    <artifactId>factcast-test</artifactId>
    <version>${factcast.version}</version>
    <scope>test</scope>
</dependency>
  • to allow TLS free authentication between our test code and the local FactCast server, create an application.properties file in the project’s resources directory with the following content:
grpc.client.factstore.negotiationType=PLAINTEXT

This will make the client application connect to the server without using TLS.

Writing The Integration Test

Our integration test builds upon the previous unit test example. This time however, we want to check if the UserEmailsProjection can also be updated by a real FactCast server:

@SpringBootTest
@ExtendWith(FactCastExtension.class)
class UserEmailsProjectionITest {

    @Autowired FactCast factCast;

    private final UserEmailsProjection uut = new UserEmailsProjection();

    private class FactObserverImpl implements FactObserver {

        @Override
        public void onNext(@NonNull Fact fact) {
            uut.apply(fact);
        }
    }

    @Test
    void projectionHandlesUserAddedFact() {
        UUID userId = UUID.randomUUID();
        Fact userAdded = Fact.builder()
            .id(UUID.randomUUID())
            .ns("user")
            .type("UserAdded")
            .version(1)
            .build(String.format(
                "{\"id\":\"%s\", \"email\": \"%s\"}",
                userId,
                "user@bar.com"));

        factCast.publish(userAdded);

        SubscriptionRequest subscriptionRequest = SubscriptionRequest
            .catchup(FactSpec.ns("user").type("UserAdded"))
            .or(FactSpec.ns("user").type("UserRemoved"))
            .fromScratch();

        factCast.subscribe(subscriptionRequest, new FactObserverImpl()).awaitComplete();

        Set<String> userEmails = uut.getUserEmails();
        assertThat(userEmails).hasSize(1).containsExactly("user@bar.com");
  }
  //...

The previously mentioned FactCastExtension starts the FactCast server and the Postgres database once before the first test is executed. Between the tests, the extension wipes all old facts from the FactCast server so that you are guaranteed to always start from scratch.

Once a fact is received, FactCast invokes the onNext method of the FactObserverImpl, which delegates to the apply method of the UserEmailsProjection.

For details of the FactCast low-level API please refer to the API documentation.

Testing with Factus

Factus builds up on the low-level FactCast API and provides a higher level of abstraction. To see Factus in action we use the same scenario as before, an UserEmailsProjection which we will ask for a set of user emails.

These are the events we need to handle:

The UserAdded event contains two properties, the user ID and the email whereas UserRemoved only contains the user ID.

An Example Event

To get an idea of how the events are defined, let’s have a look inside UserAdded:

@Getter
@Specification(ns = "user", type = "UserAdded", version = 1)
public class UserAdded implements EventObject {

    private UUID userId;
    private String email;

    // used by Jackson deserializer
    protected UserAdded(){}

    public static UserAdded of(UUID userId, String email) {
        UserAdded fact = new UserAdded();
        fact.userId = userId;
        fact.email = email;
        return fact;
    }

    @Override
    public Set<UUID> aggregateIds() {
        return Collections.emptySet();
      }
}

We create a Factus compatible event by implementing the EventObject interface and supplying the fact details via the @Specification annotation. The event itself contains the properties userId and email which are simply fields of the UserAdded class. The protected no-args constructor is used by Jackson when deserializing from JSON back to a POJO. The of factory method is used by application- and test code to create an UserAdded event. For more details on how to define a Factus event read on here.

The User Emails Projection

Now that we know which events to handle, we can process them in the Factus based UserEmailsProjection:

public class UserEmailsProjection extends LocalManagedProjection {

    private final Map<UUID, String> userEmails = new HashMap<>();

    public Set<String> getEmails() {
        return new HashSet<>(userEmails.values());
    }

    @Handler
    void apply(UserAdded event) {
        userEmails.put(event.getUserId(), event.getEmail());
    }

    @Handler
    void apply(UserRemoved event) {
        userEmails.remove(event.getUserId());
    }
}

You will instantly notice how short this implementation is compared to the UserEmailsProjection class of the low-level API example before. No dispatching or explicit JSON parsing is needed. Instead, the event handler methods each receive their event as plain Java POJO which is ready to use.

As projection type we decided for a LocalManagedProjection which is intended for self-managed, in-memory use cases. See here for detailed reading on the various Factus supported projection types.

Unit Tests

The unit test for this projection tests each handler method individually. As an example, here is the test for the UserAdded event handler:

@Test
void whenHandlingUserAddedEventEmailIsAdded() {
    UUID someUserId = UUID.randomUUID();

    UserEmailsProjection uut = new UserEmailsProjection();
    uut.apply(UserAdded.of(someUserId, "foo@bar.com"));

    Set<String> emails = uut.getEmails();
    assertThat(emails).hasSize(1).containsExactly("foo@bar.com");
}

First we create a userAddedEvent which we then apply to the responsible handler method of the UserEmailsProjection class. To check the result, we fetch the Set of emails and, as last part, examine the content.

Integration Test

After we have covered each handler method with detailed tests on unit level, we also want an integration test to test against a real FactCast server.

Here is an example:

@SpringBootTest
@ExtendWith(FactCastExtension.class)
public class UserEmailsProjectionITest {

    @Autowired Factus factus;

    @Test
    void projectionHandlesUserAddedEvent() {
        UserAdded userAdded = UserAdded.of(UUID.randomUUID(), "user@bar.com");
        factus.publish(userAdded);

        UserEmailsProjection uut = new UserEmailsProjection();
        factus.update(uut);

        Set<String> emails = uut.getEmails();
        assertThat(emails).hasSize(1).containsExactly("user@bar.com");
    }
    //...

The annotations of the test class are identical to the integration test shown for the low-level API. Hence, we only introduce them quickly here:

  • @SpringBootTest
    • starts a Spring container to enable dependency injection of the factus Spring bean
  • @ExtendWith(FactCastExtension.class)
    • starts a FactCast and its Postgres database in the background
    • erases old events inside FactCast before each test

The test itself first creates a UserAdded event which is then published to FactCast. Compared to the low-level integration test, the “act” part is slim and shows the power of the Factus API: The call to factus.update(...) builds a subscription request for all the handled events of the UserEmailsProjection class. The events returned from FactCast are then automatically applied to the correct handler.

The test concludes by checking if the state of the UserEmailsProjection was updated as correctly.

Full Example Code

The code for all examples introduced here can be found here.