You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2016/01/08 23:51:39 UTC

[jira] [Commented] (SPARK-12488) LDA describeTopics() Generates Invalid Term IDs

    [ https://issues.apache.org/jira/browse/SPARK-12488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090126#comment-15090126 ] 

Joseph K. Bradley commented on SPARK-12488:
-------------------------------------------

It was just reported on this thread that it might be a YARN issue since it did not appear in local or standalone mode.  CC: [~andrewor14] Any thoughts, and are there others I should ping?

Also, is there anyone with a YARN cluster who can try it on Spark 1.6 or master?

> LDA describeTopics() Generates Invalid Term IDs
> -----------------------------------------------
>
>                 Key: SPARK-12488
>                 URL: https://issues.apache.org/jira/browse/SPARK-12488
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.5.2
>            Reporter: Ilya Ganelin
>
> When running the LDA model, and using the describeTopics function, invalid values appear in the termID list that is returned:
> The below example generates 10 topics on a data set with a vocabulary of 685.
> {code}
>     // Set LDA parameters
>     val numTopics = 10
>     val lda = new LDA().setK(numTopics).setMaxIterations(10)
>     val ldaModel = lda.run(docTermVector)
>     val distModel = ldaModel.asInstanceOf[org.apache.spark.mllib.clustering.DistributedLDAModel]
> {code}
> {code}
> scala> ldaModel.describeTopics()(0)._1.sorted.reverse
> res40: Array[Int] = Array(2064860663, 2054149956, 1991041659, 1986948613, 1962816105, 1858775243, 1842920256, 1799900935, 1792510791, 1792371944, 1737877485, 1712816533, 1690397927, 1676379181, 1664181296, 1501782385, 1274389076, 1260230987, 1226545007, 1213472080, 1068338788, 1050509279, 714524034, 678227417, 678227086, 624763822, 624623852, 618552479, 616917682, 551612860, 453929488, 371443786, 183302140, 58762039, 42599819, 9947563, 617, 616, 615, 612, 603, 597, 596, 595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583, 582, 581, 580, 579, 578, 577, 576, 575, 574, 573, 572, 571, 570, 569, 568, 567, 566, 565, 564, 563, 562, 561, 560, 559, 558, 557, 556, 555, 554, 553, 552, 551, 550, 549, 548, 547, 546, 545, 544, 543, 542, 541, 540, 539, 538, 537, 536, 535, 534, 533, 532, 53...
> {code}
> {code}
> scala> ldaModel.describeTopics()(0)._1.sorted
> res41: Array[Int] = Array(-2087809139, -2001127319, -1979718998, -1833443915, -1811530305, -1765302237, -1668096260, -1527422175, -1493838005, -1452770216, -1452508395, -1452502074, -1452277147, -1451720206, -1450928740, -1450237612, -1448730073, -1437852514, -1420883015, -1418557080, -1397997340, -1397995485, -1397991169, -1374921919, -1360937376, -1360533511, -1320627329, -1314475604, -1216400643, -1210734882, -1107065297, -1063529036, -1062984222, -1042985412, -1009109620, -951707740, -894644371, -799531743, -627436045, -586317106, -563544698, -326546674, -174108802, -155900771, -80887355, -78916591, -26690004, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 4...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org