Home Artificial Intelligence Data ingestion pipeline with Operation Management Introduction Goals Marken Architecture Cassandra Tables ElasticSearch APIs Error handling Future work

Data ingestion pipeline with Operation Management Introduction Goals Marken Architecture Cassandra Tables ElasticSearch APIs Error handling Future work

0
Data ingestion pipeline with Operation Management
Introduction
Goals
Marken Architecture
Cassandra Tables
ElasticSearch
APIs
Error handling
Future work

Annotation Operations
  • Through the first run of the algorithm it identified 500 objects in a selected Video file. These 500 objects were stored as annotations of a particular schema type, let’s say Objects, in Marken.
  • The Algorithm team improved their algorithm. Now after we re-ran the algorithm on the identical video file it created 600 annotations of schema type Objects and stored them in our service.
  • Before Algo run 1, in the event that they search they mustn’t find anything.
  • After the completion of Algo run 1, the query should find the primary set of 500 annotations.
  • Through the time when Algo run 2 was creating the set of 600 annotations, clients search should still return the older 500 annotations.
  • When all the 600 annotations are successfully created, they need to replace the older set of 500.
  • So now when clients search annotations for Objects then they need to get 600 annotations.
  • Write different runs in several databases. This is clearly very expensive.
  • Write algo runs into files. But we cannot search or present low latency retrievals from files
  • Etc.
Marken Architecture
  • Annotation Schema Type — identifies the schema for the annotation generated by the Algorithm.
  • Annotation Schema Version — identifies the schema version of the annotation generated by the Algorithm.
  • PivotId — a singular string identifier which identifies the file or method which is used to generate the annotations. This may very well be the SHA hash of the file or just the movie Identifier number.
{
"annotationOperationKeys": [
{
"annotationType": "string", ❶
"annotationTypeVersion": “integer”,
"pivotId": "string",
"operationNumber": “integer” ❷
}
],
"id": "UUID",
"operationStatus": "STARTED", ❸
"isActive": true ❹
}
  1. We already explained AnnotationType, AnnotationTypeVersion and PivotId above.
  2. OperationNumber is an auto incremented number for every recent operation.
  3. OperationStatus — An operation goes through three phases, Began, Finished and Canceled.
  4. IsActive — Whether an operation and its associated annotations are lively and searchable.
  • AnnotationOperationById — It stores the AnnotationOperations
  • AnnotationIdByAnnotationOperationId — it stores the Ids of all annotations in an operation.
  • annotationOperationId — The ID of the operation to which this annotation belongs
  • isAnnotationOperationActive — Whether the operation is in an ACTIVE state.

StartAnnotationOperation

UpsertAnnotationsInOperation

  • Marks the present operation (let’s say with ID2) to be operationStatus = FINISHED and isAnnotationOperationActive=ACTIVE.
  • We remove the ID2 from the Memcache because it shouldn’t be in STARTED state.
  • Any previous operation (let’s say with ID1) which was ACTIVE is now marked isAnnotationOperationActive=FALSE in Cassandra.
  • Finally, we call updateByQuery API in ElasticSearch. This API finds all Elasticsearch documents with ID1 and marks isAnnotationOperationActive=FALSE.
FinishAnnotationOperation

  • any annotations that are from isAnnotationOperationActive=FALSE operations or
  • for which Annotation operations are currently in STARTED state. We try this by excluding the next from all queries in our system.
  1. We add a filter in our ES query to exclude isAnnotationOperationStatus is FALSE.
  2. We query EVCache to search out out all operations that are in STARTED state. Then we exclude all those annotations with annotationId present in memcache. Using memcache allows us to maintain latencies for our search low (most of our queries are lower than 100ms).

LEAVE A REPLY

Please enter your comment!
Please enter your name here