You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Y H <yu...@gmail.com> on 2021/07/16 10:29:10 UTC

druid can't parse string

hi, i am using druid for develop analytic-web.
And i found druid can't parse language without english

[image: image.png]

is there any option on utf-8 OR way to parse string correctly?

i attached my druid environment file,
please let me know way to parse string in druid

thanks.



environment
___________________________________________________
DRUID_XMS=1g
DRUID_MAXNEWSIZE=250m
DRUID_NEWSIZE=250m
DRUID_MAXDIRECTMEMORYSIZE=6172m

druid_emitter_logging_logLevel=debug

druid_extensions_loadList=["druid-stats","druid-histogram",
"druid-datasketches", "druid-lookups-cached-global",
"postgresql-metadata-storage", "druid-kafka-indexing-service",
"druid-kafka-extraction-namespace"]

druid_zk_service_host=zookeeper

# kafka config
listeners=PLAINTEXT://211.253.8.155:59092


# druid_metadata_storage_host=
druid_metadata_storage_type=postgresql
druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
druid_metadata_storage_connector_user=druid
druid_metadata_storage_connector_password=FoolishPassword

druid_coordinator_balancer_strategy=cachingCost

druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g",
"-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC",
"-Dfile.encoding=UTF-8",
"-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
druid_indexer_fork_property_druid_processing_buffer_sizeBytes=268435456

druid_storage_type=local
druid_storage_storageDirectory=/opt/data/segments
druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/data/indexing-logs

druid_processing_numThreads=2
druid_processing_numMergeBuffers=2


DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration
status="WARN"><Appenders><Console name="Console"
target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c -
%m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef
ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog"
additivity="false" level="DEBUG"><AppenderRef
ref="Console"/></Logger></Loggers></Configuration>

Re: druid can't parse string

Posted by Ben Krug <be...@imply.io>.
Are you using the console, or an ingestion spec?  If you use a spec, you
might attach it.  If you're using the console, and if the strings have
commas in them, maybe .tsv would work, and you can create a file with a
different delimiter.  (In .tsv, you can choose the delimiter; it doesn't
have to be a tab.)  Or you can take a screenshot of what's happening and
attach that, it might help.

On Fri, Jul 16, 2021 at 11:25 AM Y H <yu...@gmail.com> wrote:

> thanks!
> But i still have problem
>
> i success to store string as UTF-8 with inline text ingestion. But when i
> try to ingest batch type with csv, it encoded awkword.
>
> the problem seems to happen when read csv. Should i transform csv file to
> text file?? and if i ingest batch data with text file, what type of parser
> should i choose?(still .*csv ?)
>
>
>
> 2021년 7월 17일 (토) 오전 1:46, Gian Merlino <gi...@apache.org>님이 작성:
>
> > Including the original poster in case they are not on the dev list
> > themselves (hello!).
> >
> > On Fri, Jul 16, 2021 at 9:44 AM Gian Merlino <gi...@apache.org> wrote:
> >
> >> Druid stores strings as UTF-8 and from a storage and query basis, it
> >> should work fine with any language. The
> >> "wikiticker-2015-09-12-sampled.json.gz" dataset used for the tutorial
> has
> >> strings in a variety of languages (check the "page" field):
> >> https://druid.apache.org/docs/latest/tutorials/index.html
> >>
> >> So I wonder if there is an encoding problem with reading your input
> data?
> >> If it's in a text format, it should be encoded as UTF-8 for Druid to be
> >> able to read it properly.
> >>
> >
> >>
> >> On Fri, Jul 16, 2021 at 7:51 AM Y H <yu...@gmail.com> wrote:
> >>
> >>> hi, i am using druid for develop analytic-web.
> >>> And i found druid can't parse language without english
> >>>
> >>> [image: image.png]
> >>>
> >>> is there any option on utf-8 OR way to parse string correctly?
> >>>
> >>> i attached my druid environment file,
> >>> please let me know way to parse string in druid
> >>>
> >>> thanks.
> >>>
> >>>
> >>>
> >>> environment
> >>> ___________________________________________________
> >>> DRUID_XMS=1g
> >>> DRUID_MAXNEWSIZE=250m
> >>> DRUID_NEWSIZE=250m
> >>> DRUID_MAXDIRECTMEMORYSIZE=6172m
> >>>
> >>> druid_emitter_logging_logLevel=debug
> >>>
> >>> druid_extensions_loadList=["druid-stats","druid-histogram",
> >>> "druid-datasketches", "druid-lookups-cached-global",
> >>> "postgresql-metadata-storage", "druid-kafka-indexing-service",
> >>> "druid-kafka-extraction-namespace"]
> >>>
> >>> druid_zk_service_host=zookeeper
> >>>
> >>> # kafka config
> >>> listeners=PLAINTEXT://211.253.8.155:59092
> >>>
> >>>
> >>> # druid_metadata_storage_host=
> >>> druid_metadata_storage_type=postgresql
> >>>
> >>>
> druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
> >>> druid_metadata_storage_connector_user=druid
> >>> druid_metadata_storage_connector_password=FoolishPassword
> >>>
> >>> druid_coordinator_balancer_strategy=cachingCost
> >>>
> >>> druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g",
> >>> "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC",
> >>> "-Dfile.encoding=UTF-8",
> >>> "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
> >>> druid_indexer_fork_property_druid_processing_buffer_sizeBytes=268435456
> >>>
> >>> druid_storage_type=local
> >>> druid_storage_storageDirectory=/opt/data/segments
> >>> druid_indexer_logs_type=file
> >>> druid_indexer_logs_directory=/opt/data/indexing-logs
> >>>
> >>> druid_processing_numThreads=2
> >>> druid_processing_numMergeBuffers=2
> >>>
> >>>
> >>> DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration
> >>> status="WARN"><Appenders><Console name="Console"
> >>> target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c -
> >>> %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef
> >>> ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog"
> >>> additivity="false" level="DEBUG"><AppenderRef
> >>> ref="Console"/></Logger></Loggers></Configuration>
> >>>
> >>>
>

Re: druid can't parse string

Posted by Y H <yu...@gmail.com>.
thanks!
But i still have problem

i success to store string as UTF-8 with inline text ingestion. But when i
try to ingest batch type with csv, it encoded awkword.

the problem seems to happen when read csv. Should i transform csv file to
text file?? and if i ingest batch data with text file, what type of parser
should i choose?(still .*csv ?)



2021년 7월 17일 (토) 오전 1:46, Gian Merlino <gi...@apache.org>님이 작성:

> Including the original poster in case they are not on the dev list
> themselves (hello!).
>
> On Fri, Jul 16, 2021 at 9:44 AM Gian Merlino <gi...@apache.org> wrote:
>
>> Druid stores strings as UTF-8 and from a storage and query basis, it
>> should work fine with any language. The
>> "wikiticker-2015-09-12-sampled.json.gz" dataset used for the tutorial has
>> strings in a variety of languages (check the "page" field):
>> https://druid.apache.org/docs/latest/tutorials/index.html
>>
>> So I wonder if there is an encoding problem with reading your input data?
>> If it's in a text format, it should be encoded as UTF-8 for Druid to be
>> able to read it properly.
>>
>
>>
>> On Fri, Jul 16, 2021 at 7:51 AM Y H <yu...@gmail.com> wrote:
>>
>>> hi, i am using druid for develop analytic-web.
>>> And i found druid can't parse language without english
>>>
>>> [image: image.png]
>>>
>>> is there any option on utf-8 OR way to parse string correctly?
>>>
>>> i attached my druid environment file,
>>> please let me know way to parse string in druid
>>>
>>> thanks.
>>>
>>>
>>>
>>> environment
>>> ___________________________________________________
>>> DRUID_XMS=1g
>>> DRUID_MAXNEWSIZE=250m
>>> DRUID_NEWSIZE=250m
>>> DRUID_MAXDIRECTMEMORYSIZE=6172m
>>>
>>> druid_emitter_logging_logLevel=debug
>>>
>>> druid_extensions_loadList=["druid-stats","druid-histogram",
>>> "druid-datasketches", "druid-lookups-cached-global",
>>> "postgresql-metadata-storage", "druid-kafka-indexing-service",
>>> "druid-kafka-extraction-namespace"]
>>>
>>> druid_zk_service_host=zookeeper
>>>
>>> # kafka config
>>> listeners=PLAINTEXT://211.253.8.155:59092
>>>
>>>
>>> # druid_metadata_storage_host=
>>> druid_metadata_storage_type=postgresql
>>>
>>> druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
>>> druid_metadata_storage_connector_user=druid
>>> druid_metadata_storage_connector_password=FoolishPassword
>>>
>>> druid_coordinator_balancer_strategy=cachingCost
>>>
>>> druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g",
>>> "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC",
>>> "-Dfile.encoding=UTF-8",
>>> "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
>>> druid_indexer_fork_property_druid_processing_buffer_sizeBytes=268435456
>>>
>>> druid_storage_type=local
>>> druid_storage_storageDirectory=/opt/data/segments
>>> druid_indexer_logs_type=file
>>> druid_indexer_logs_directory=/opt/data/indexing-logs
>>>
>>> druid_processing_numThreads=2
>>> druid_processing_numMergeBuffers=2
>>>
>>>
>>> DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration
>>> status="WARN"><Appenders><Console name="Console"
>>> target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c -
>>> %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef
>>> ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog"
>>> additivity="false" level="DEBUG"><AppenderRef
>>> ref="Console"/></Logger></Loggers></Configuration>
>>>
>>>

Re: druid can't parse string

Posted by Gian Merlino <gi...@apache.org>.
Including the original poster in case they are not on the dev list
themselves (hello!).

On Fri, Jul 16, 2021 at 9:44 AM Gian Merlino <gi...@apache.org> wrote:

> Druid stores strings as UTF-8 and from a storage and query basis, it
> should work fine with any language. The
> "wikiticker-2015-09-12-sampled.json.gz" dataset used for the tutorial has
> strings in a variety of languages (check the "page" field):
> https://druid.apache.org/docs/latest/tutorials/index.html
>
> So I wonder if there is an encoding problem with reading your input data?
> If it's in a text format, it should be encoded as UTF-8 for Druid to be
> able to read it properly.
>
>
> On Fri, Jul 16, 2021 at 7:51 AM Y H <yu...@gmail.com> wrote:
>
>> hi, i am using druid for develop analytic-web.
>> And i found druid can't parse language without english
>>
>> [image: image.png]
>>
>> is there any option on utf-8 OR way to parse string correctly?
>>
>> i attached my druid environment file,
>> please let me know way to parse string in druid
>>
>> thanks.
>>
>>
>>
>> environment
>> ___________________________________________________
>> DRUID_XMS=1g
>> DRUID_MAXNEWSIZE=250m
>> DRUID_NEWSIZE=250m
>> DRUID_MAXDIRECTMEMORYSIZE=6172m
>>
>> druid_emitter_logging_logLevel=debug
>>
>> druid_extensions_loadList=["druid-stats","druid-histogram",
>> "druid-datasketches", "druid-lookups-cached-global",
>> "postgresql-metadata-storage", "druid-kafka-indexing-service",
>> "druid-kafka-extraction-namespace"]
>>
>> druid_zk_service_host=zookeeper
>>
>> # kafka config
>> listeners=PLAINTEXT://211.253.8.155:59092
>>
>>
>> # druid_metadata_storage_host=
>> druid_metadata_storage_type=postgresql
>>
>> druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
>> druid_metadata_storage_connector_user=druid
>> druid_metadata_storage_connector_password=FoolishPassword
>>
>> druid_coordinator_balancer_strategy=cachingCost
>>
>> druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g",
>> "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC",
>> "-Dfile.encoding=UTF-8",
>> "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
>> druid_indexer_fork_property_druid_processing_buffer_sizeBytes=268435456
>>
>> druid_storage_type=local
>> druid_storage_storageDirectory=/opt/data/segments
>> druid_indexer_logs_type=file
>> druid_indexer_logs_directory=/opt/data/indexing-logs
>>
>> druid_processing_numThreads=2
>> druid_processing_numMergeBuffers=2
>>
>>
>> DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration
>> status="WARN"><Appenders><Console name="Console"
>> target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c -
>> %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef
>> ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog"
>> additivity="false" level="DEBUG"><AppenderRef
>> ref="Console"/></Logger></Loggers></Configuration>
>>
>>

Re: druid can't parse string

Posted by Gian Merlino <gi...@apache.org>.
Druid stores strings as UTF-8 and from a storage and query basis, it should
work fine with any language. The "wikiticker-2015-09-12-sampled.json.gz"
dataset used for the tutorial has strings in a variety of languages (check
the "page" field): https://druid.apache.org/docs/latest/tutorials/index.html

So I wonder if there is an encoding problem with reading your input data?
If it's in a text format, it should be encoded as UTF-8 for Druid to be
able to read it properly.


On Fri, Jul 16, 2021 at 7:51 AM Y H <yu...@gmail.com> wrote:

> hi, i am using druid for develop analytic-web.
> And i found druid can't parse language without english
>
> [image: image.png]
>
> is there any option on utf-8 OR way to parse string correctly?
>
> i attached my druid environment file,
> please let me know way to parse string in druid
>
> thanks.
>
>
>
> environment
> ___________________________________________________
> DRUID_XMS=1g
> DRUID_MAXNEWSIZE=250m
> DRUID_NEWSIZE=250m
> DRUID_MAXDIRECTMEMORYSIZE=6172m
>
> druid_emitter_logging_logLevel=debug
>
> druid_extensions_loadList=["druid-stats","druid-histogram",
> "druid-datasketches", "druid-lookups-cached-global",
> "postgresql-metadata-storage", "druid-kafka-indexing-service",
> "druid-kafka-extraction-namespace"]
>
> druid_zk_service_host=zookeeper
>
> # kafka config
> listeners=PLAINTEXT://211.253.8.155:59092
>
>
> # druid_metadata_storage_host=
> druid_metadata_storage_type=postgresql
>
> druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
> druid_metadata_storage_connector_user=druid
> druid_metadata_storage_connector_password=FoolishPassword
>
> druid_coordinator_balancer_strategy=cachingCost
>
> druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g",
> "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC",
> "-Dfile.encoding=UTF-8",
> "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
> druid_indexer_fork_property_druid_processing_buffer_sizeBytes=268435456
>
> druid_storage_type=local
> druid_storage_storageDirectory=/opt/data/segments
> druid_indexer_logs_type=file
> druid_indexer_logs_directory=/opt/data/indexing-logs
>
> druid_processing_numThreads=2
> druid_processing_numMergeBuffers=2
>
>
> DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration
> status="WARN"><Appenders><Console name="Console"
> target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c -
> %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef
> ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog"
> additivity="false" level="DEBUG"><AppenderRef
> ref="Console"/></Logger></Loggers></Configuration>
>
>