You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lens.apache.org by "Raghav Shankar (JIRA)" <ji...@apache.org> on 2018/07/09 04:58:00 UTC
[jira] [Comment Edited] (LENS-1522) python client fails to parse timestamps correctly

    [ https://issues.apache.org/jira/browse/LENS-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532269#comment-16532269 ] 

Raghav Shankar edited comment on LENS-1522 at 7/9/18 4:57 AM:
--------------------------------------------------------------

the bug can be fixed/worked around by changing two lines in
{code:java}
contrib/clients/python/lens/client/query.py {code}
{code:java}
diff --git a/contrib/clients/python/lens/client/query.py b/contrib/clients/python/lens/client/query.py
index 7530e0244..51aaf2d04 100644
--- a/contrib/clients/python/lens/client/query.py
+++ b/contrib/clients/python/lens/client/query.py
@@ -19,6 +19,8 @@ import csv
import logging
import time
import zipfile
+import re
+import datetime as dt
import requests
from requests.exceptions import HTTPError
@@ -134,7 +136,18 @@ class LensPersistentResult(LensQueryResult):
if self.is_header_present:
next(reader_iterator)
for line in reader_iterator:
- yield self._parse_line(line)
+ newline = [
+ dt.datetime.strptime(
+ line[index],
+ '%Y-%m-%d %H:%M:%S.%f'
+ ).timestamp()
+ if self.header.columns[index].type == 'TIMESTAMP'
+ and isinstance(line[index], str)
+ and re.match(r'[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]', line[index]) is not None
+ else line[index]
+ for index in range(len(line))
+ ]
+ yield self._parse_line(newline)
byte_stream.close()
else:
stream = codecs.iterdecode(self.response.iter_lines(),
@@ -144,7 +157,18 @@ class LensPersistentResult(LensQueryResult):
if self.is_header_present:
next(reader_iterator)
for line in reader_iterator:
- yield self._parse_line(line)
+ newline = [
+ dt.datetime.strptime(
+ line[index],
+ '%Y-%m-%d %H:%M:%S.%f'
+ ).timestamp()
+ if self.header.columns[index].type == 'TIMESTAMP'
+ and isinstance(line[index], str)
+ and re.match(r'[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]', line[index]) is not None
+ else line[index]
+ for index in range(len(line))
+ ]
+ yield self._parse_line(newline)
stream.close()

{code}
 


was (Author: raghav.shankar):
the bug can be fixed/worked around by changing two lines in
{code:java}
contrib/clients/python/lens/client/query.py{code}
of the GitHub mirror.

 
{code:java}
diff --git a/contrib/clients/python/lens/client/query.py b/contrib/clients/python/lens/client/query.py
index 7530e0244..51aaf2d04 100644
--- a/contrib/clients/python/lens/client/query.py
+++ b/contrib/clients/python/lens/client/query.py
@@ -19,6 +19,8 @@ import csv
import logging
import time
import zipfile
+import re
+import datetime as dt
import requests
from requests.exceptions import HTTPError
@@ -134,7 +136,18 @@ class LensPersistentResult(LensQueryResult):
if self.is_header_present:
next(reader_iterator)
for line in reader_iterator:
- yield self._parse_line(line)
+ newline = [
+ dt.datetime.strptime(
+ line[index],
+ '%Y-%m-%d %H:%M:%S.%f'
+ ).timestamp()
+ if self.header.columns[index].type == 'TIMESTAMP'
+ and isinstance(line[index], str)
+ and re.match(r'[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]', line[index]) is not None
+ else line[index]
+ for index in range(len(line))
+ ]
+ yield self._parse_line(newline)
byte_stream.close()
else:
stream = codecs.iterdecode(self.response.iter_lines(),
@@ -144,7 +157,18 @@ class LensPersistentResult(LensQueryResult):
if self.is_header_present:
next(reader_iterator)
for line in reader_iterator:
- yield self._parse_line(line)
+ newline = [
+ dt.datetime.strptime(
+ line[index],
+ '%Y-%m-%d %H:%M:%S.%f'
+ ).timestamp()
+ if self.header.columns[index].type == 'TIMESTAMP'
+ and isinstance(line[index], str)
+ and re.match(r'[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]', line[index]) is not None
+ else line[index]
+ for index in range(len(line))
+ ]
+ yield self._parse_line(newline)
stream.close()

{code}
 

> python client fails to parse timestamps correctly
> -------------------------------------------------
>
>                 Key: LENS-1522
>                 URL: https://issues.apache.org/jira/browse/LENS-1522
>             Project: Apache Lens
>          Issue Type: Bug
>          Components: python-client
>    Affects Versions: 2.7
>         Environment: MacOS X, python3.6
>            Reporter: Raghav Shankar
>            Priority: Major
>              Labels: easyfix, python
>             Fix For: 2.7
>
>         Attachments: lens.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the latest version of the python client (python is 3.6), If a query with a time field is submitted with fetch_results set to true, iterating through the results object returned will always fail with a value error. Here is the error message:
> {code:java}
> ~/pythonprogs/prog1/src/lenspythonclient/contrib/clients/python/lens/client/query.py in <genexpr>(.0)
>     114
>     115     def _parse_line(self, line):
> --> 116         return list(self._mapping(self.header.columns[index].type (line[index]) for index in range(len(line)))
>     117
>     118     def get_csv_reader(self, file):
> ValueError: invalid literal for int() with base 10: '2018-06-30 00:00:00.0'
> {code}
> It seems that the actual type of the date field is timestamp, and is not converted correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)