Skip to content

Custom & Agent

Custom

Custom is an extension mechanism provided by the framework, allowing developers to implement special requirements that cannot be fulfilled by the built-in recognition algorithms and action types.

  1. Custom Recognition

    When the built-in recognition algorithms (such as TemplateMatch, FeatureMatch, OCR, etc.) cannot achieve specific image recognition needs, you can use custom recognition methods.

  2. Custom Action

    When the built-in action types (such as Click, Swipe, InputText, etc.) cannot fulfill specific operational needs, you can execute custom actions.

There are two mainstream approaches to writing Custom:

  • For registering Custom via UI, please refer to the respective UI manual.

  • For registering Custom via Agent, please refer to: Agent (⭐ Recommended)

For calling Custom in Pipeline, please refer to Task Pipeline Protocol.

Agent

When you need custom recognition or custom actions that are difficult to implement with Pipeline, Agent provides a more robust extension approach.

  • Improves stability: Custom code is isolated from the framework core, reducing the risk that custom logic failures affect the whole process.
  • Easier to extend: Suitable for complex business logic that needs independent iteration.
  • Easier to maintain: Keeps a clearer boundary between framework upgrades and custom implementations.
  • Multiple language support: You can choose the programming language that suits you and leverage its native features.

AgentClient and AgentServer

AgentServer

AgentServer is used to register and execute your custom recognition and custom action code. It is commonly used to implement more complex recognition decisions or business actions, and to implement event listeners. It focuses on business capability itself rather than main-flow scheduling or framework lifecycle management.

A simple AgentServer example can be found in this template commit.

AgentClient

AgentClient is part of the UI, and is responsible for communicating with AgentServer. When the main flow reaches a custom node, AgentClient sends the request to the Server, receives the result after execution on the other side, and then brings the result back to continue the main flow. It bridges the invocation chain, but does not carry concrete business rules itself.

When do you need to write Agent?

When do you need to write AgentServer?

NOTE

In common context, if not specifically stated, "Agent" in discussions usually refers to AgentServer.

It is usually recommended in the following scenarios:

  • Custom recognition/action logic is complex, and implementing it with Pipeline is cumbersome.
  • Custom recognition/actions require special logic or algorithms that cannot be implemented with Pipeline.

When do you need to write AgentClient?

WARNING

AgentClient is handled by the UI. In most cases, you do not need to implement it yourself.

If you think the existing Generic UI cannot meet your needs and you need to write a new generic UI yourself, then AgentClient is a required part.