Solved

DB extractor - storing incremental state for Timeseries query

  • 24 January 2023
  • 7 replies
  • 128 views

Userlevel 1

Hi,

 

I am observing below error(in bold) in extractor log for timeseries query:

 

2023-01-24 03:31:03.345 UTC [INFO    ] QueryExecutor_0 - Starting extraction of avocet-timeseries
2023-01-24 03:31:03.345 UTC [DEBUG   ] QueryExecutor_0 - Query: 'SELECT ITEM_ID + PROPERTY_TYPE+'Avocet' as "externalId", START_DATETIME as timestamp, PROPERTY_VALUE as value FROM ITEM_PROPERTY1 WHERE START_DATETIME is not null and PROPERTY_VALUE is not null and LAST_UPDT_DATE is not null ORDER BY LAST_UPDT_DATE ASC'
2023-01-24 03:31:03.612 UTC [INFO    ] QueryExecutor_0 - No more rows for avocet-timeseries. 2 rows extracted in 0.267 seconds
2023-01-24 03:31:03.873 UTC [DEBUG   ] ThreadPoolExecutor-9_0 - https://westeurope-1.cognitedata.com:443 "POST /api/v1/projects/slb-pdf/timeseries/data HTTP/1.1" 200 22
2023-01-24 03:31:03.875 UTC [DEBUG   ] ThreadPoolExecutor-9_0 - HTTP/1.1 POST https://westeurope-1.cognitedata.com/api/v1/projects/slb-pdf/timeseries/data 200
2023-01-24 03:31:03.877 UTC [INFO    ] QueryExecutor_0 - Uploaded 2 datapoints
2023-01-24 03:31:03.878 UTC [ERROR   ] QueryExecutor_0 - Can't store state for avocet-timeseries, LAST_UPDT_DATE is NULL/not present
2023-01-24 03:31:03.880 UTC [ERROR   ] QueryExecutor_0 - Error in extraction of avocet-timeseries: 'Missing incremental_field'

2023-01-24 03:31:03.881 UTC [INFO    ] QueryExecutor_0 - Closing ODBC connection to sqldb for avocet-timeseries
 

 

The query section in the configuration is as below:

  name: avocet-timeseries
  query: 
    SELECT
      ITEM_ID + PROPERTY_TYPE as "externalId",
      START_DATETIME as timestamp,
      PROPERTY_VALUE as value
    FROM
      ITEM_PROPERTY1
    WHERE
      START_DATETIME is not null and PROPERTY_VALUE is not null and LAST_UPDT_DATE is not null and LAST_UPDT_DATE >= CAST('{start_at}' AS DATETIME2)
    ORDER BY
      LAST_UPDT_DATE ASC   
  incremental-field: LAST_UPDT_DATE
  schedule: "*/1 * * * *"
  initial-start: '2020-01-01'
  destination-type: TIME_SERIES

 

 

If we use the raw queries, the state is being maintained properly but for timeseries query I am facing this issue.

Can you please take a look and provide the recommendations?

 

icon

Best answer by Pierre Pernot 27 January 2023, 08:59

View original

7 replies

Userlevel 5

Hi,


Can you try to add “LAST_UPDT_DATE” inside of the SELECT statement ? Please let us know if that works better.

 

Best regards,

Userlevel 1

Thanks Pierre for your reply. Instead of START_DATETIME I used LAST_UPDT_DATE for timestamp in select but I still see the same error. 

Userlevel 5

could you please share your whole query parameter ? including the incremental field ? 

Userlevel 1

The actual timestamp is present in the column  START_DATETIME but as you mentioned above I modified it to LAST_UPDT_DATE in select query. 

Here is the query section in config:  

-
  name: timeseries
  database: sqldb # same as above database

  query: 
    SELECT
      ID + TYPE as "externalId",
      LAST_UPDT_DATE as timestamp,
      PROPERTY_VALUE as value
    FROM
      ITEM_TABLE
    WHERE
      START_DATETIME is not null and PROPERTY_VALUE is not null and LAST_UPDT_DATE is not null and LAST_UPDT_DATE >= CAST('{start_at}' AS DATETIME2)
    ORDER BY
      LAST_UPDT_DATE ASC   
  incremental-field: LAST_UPDT_DATE
  schedule: "*/1 * * * *"
  initial-start: '2020-01-01'
  destination-type: TIME_SERIES

Userlevel 5

Hi,

I would try something like that: 
(I assume you are using SQL Server, am I correct ?)
 

- name: timeseries
database: sqldb # same as above database
query:
SELECT * FROM (
SELECT
ID + TYPE as "externalId",
CAST(LAST_UPDT_DATE as timestamp) as timestamp,
PROPERTY_VALUE as value
FROM
ITEM_TABLE
WHERE
START_DATETIME is not null and PROPERTY_VALUE is not null
)
WHERE {incremental_field} is not null and {incremental_field} >= {start_at}
ORDER BY
{incremental_field} ASC
incremental-field: timestamp
schedule: "*/1 * * * *"
initial-start: 1577836800000
destination-type: TIME_SERIES

If that does not work, please let us know and send us both the query and the logs, so we can help you debug.
Also, I don’t know what your LAST_UPDT_DATE column looks like, you might have to adjust the CAST statements.

Userlevel 1

Hi,

In our case LAST_UPDT_DATE column(incremental column) and START_DATETIME(actual physical timestamp of property) both are different. Looks like , the above query assumes both are same. 

so below query works assuming both are same:

    SELECT * FROM (
      SELECT
        ITEM_ID + PROPERTY_TYPE+'Avocet5050' as "externalId",
        CAST(LAST_UPDT_DATE AS DATETIME) as timestamp,
        PROPERTY_VALUE as value
      FROM
        ITEM_PROPERTY
      WHERE
        START_DATETIME is not null and PROPERTY_VALUE is not null
    ) as table1
    WHERE table1.{incremental_field} is not null and table1.{incremental_field} >= {start_at}
    ORDER BY
      table1.{incremental_field} ASC

  incremental-field: timestamp

What changes are required considering START_DATETIME as actual physical timestamp of measurement and LAST_UPDT_DATE as a incremental field(when that measurement got updated in database table).

 

 

 

 

Userlevel 1

Looks like, The extractor has minimum accepted date so that’s why there is an issue in storing incremental state 

Checked with below dates:

1971-01-01 05:29:59.000,  Error in logs: Discarding 1 datapoints due to bad timestamp or value

1971-01-01 05:30:00.000,  No Errors

so looks like it works when timestamp is >= 1971-01-01 05:30:00.000 (UTC+5.30 timezone). 

 

If extractor considers the source timestamp in UTC timezone , then it could have worked beyond 1971-01-01 00:00:01.000.  Does it mean that extractor considers source timezone? 

 

Reply