Wednesday, June 4, 2014

Get MongoDb Master commandline

MONGO_HOST=
MONGO_PORT=
MONGO_HOST_PORT_MASTER=$(ssh remote@host  "mongo ${MONGO_HOST}:${MONGO_PORT} --eval 'printjson(rs.isMaster())' "| grep primary | cut -d"\"" -f4)

MongoDb <-> Hive Hook

create external table if not exists  mongo_bla_display_order(
  mongo_id string,
  user_id string,
  display_order string
)
stored by 'org.yong3.hive.mongo.MongoStorageHandler'
with serdeproperties( "mongo.column.mapping" = "_id,uid,displayOrder" )
tblproperties ( "mongo.host" = "${mongoHost}" , "mongo.port" = "${mongoPort}" ,
     "mongo.db" = "my_db" , "mongo.collection" = "persistence" );

Monday, June 2, 2014

Uncompress *.xz file

# pure xz file
unxz <filename>.xz

# tar xz file
tar -Jxf <filename>.tar.xz

Saturday, April 12, 2014

Play setup:

IntelliJ Idea and Play ambiguous index definitions, and such issues:

This was fix: (Play 2.2 and IntelliJ 12 )
https://groups.google.com/forum/#!topic/play-framework/X78Ikg9PMyE

'''

Hi,

I have a problem in IntelliJ when I create a new Java Play application, generate the IDE configuration and open the project.

I see the following error in IntelliJ - "Reference to 'index' is ambiguous, both 'views.html.index$' and 'views.html.index' match"

This only occurs in the following scenarios:

Enable: Play 2.0 Support plugin, Scala plugin, and built-in Playframework Support plugin that comes with IntelliJ Ultimate
Enable: Play 2.0 Support plugin and Scala plugin and Disable: the Playframework Support plugin that comes with IntelliJ Ultimate

There are no issues when I:

Enable: Scala plugin, and Playframework Support plugin that comes with IntelliJ Ultimate and Disable: Play 2.0 Support plugin

If I change the import statement:

import views.html.*; to be: import views.html.index; all of the above configurations work.

Would someone be able to explain why this issue is occurring? I'm happy to submit a PR with the above change if this is a reasonable fix. Before I figured out how to resolve it I did some searching and there are definitely a number of other people experiencing this issue without being able to find a solution, for example:


Denise


'''


Tuesday, September 3, 2013

Start HBase rest (stargate)

ssh <user>@<hbase-remote-hoast>-nn1
> hbase rest start -p 7000 &
> disown
> exit

Tuesday, August 13, 2013

Crawl / Curl from Hive


package com.blout.thunder.hive.udf;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.log4j.Logger;

import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;

/**
 * Author: Nemanja Spasojevic
 */
@Description(
    name = "curl",
    value =  " Given url returns content of given web page, if curl failed fow whatever" +
        " reason it returns null. " +
        "string _FUNC_(string) \n"
)
public class CurlUDF extends UDF {
  private static final Logger LOG = Logger.getLogger(CurlUDF.class);
  private ListObjectInspector listInspector;

  private static int DEFAULT_SLEEP_TIME_MS = 1000;
  private static int DEFAULT_RE_TRIES      = 3;
  private static int LOG_STACK_FIRST_TIMES = 100;
  private static int counter_ = 0;

  public String evaluate(String webPageURL) throws Exception {
    return fetch(webPageURL, DEFAULT_SLEEP_TIME_MS);
  }

  public String evaluate(String webPageURL, int sleepTimeMS) {
    return fetch(webPageURL, sleepTimeMS);
  }

  public String fetch(String webPageURL, int sleepTimeMS) {
    ++counter_;

    for (int i = 1; i <= DEFAULT_RE_TRIES; ++i) {
      try {
        StringBuffer output = new StringBuffer();
        URL url = new URL(webPageURL);
        System.out.println(counter_ + ") Fetching try [" + i + "]: " + webPageURL);
        InputStream response = url.openStream();
        BufferedReader reader = new BufferedReader(new InputStreamReader(response));
        for (String line; (line = reader.readLine()) != null;) {
          output.append(line);
        }
        reader.close();
        Thread.sleep(sleepTimeMS);
        System.out.println(counter_ + ") Fetching try [" + i + "]: success");
        return output.toString();
      } catch (Exception e) {
        if (LOG_STACK_FIRST_TIMES > counter_) {
          e.printStackTrace();
        }
        try { Thread.sleep(sleepTimeMS * i * i ); } catch (Exception et) {};
        return null;
      }
    }
    return null;
  }
}

CREATE TEMPORARY FUNCTION curl AS 'com.blout.thunder.hive.udf.CurlUDF';




# Get the gradient colors for (hgreen to the red). Alwayes needed so here is example in JS:

getColor :  function(value, maxValue) {
    var h = 120 + 240 - Math.max(0, Math.min(240, 240 * value / maxValue));
    return 'hsl('+ h + ',100%,90%)'
}

Tuesday, May 21, 2013

Friday, May 10, 2013

Hive: Force UDF execution to happen on reducer side

Doing quick and dirty URL fetch from hive, I wanted for URL"s to be ditributed among 5 jobs. Input is small it's very hard to tune up on mapper side things to heppen on 5 mappers say.


Regular:


insert overwrite table url_raw_contant partition(dt = 20130606)
select full_url,
       priority,
       regexp_replace(curl_url(full_url),  '\n|\r', ' ') as raw_html
from url_queue_table_sharded_temp;



Forced  UDF execution to Reducer (5 reducers):



set mapred.reduce.tasks=5;
insert overwrite table url_raw_contant_table partition(dt = 20130606)
select full_url,
       priority,
       regexp_replace(curl_url(full_url),  '\n|\r', ' ') as raw_html
from (
    select full_url, priority
    from url_queue_table_sharded_temp
    distribute by md5(full_url) % 5
    sort by md5(full_url) % 5, priority desc
) d
distribute by md5(full_url) % 5;

Thursday, March 21, 2013

Hive Make mapper re-use JVM

usefull if your UDF has some kind of static initialization (eg. from distributed cache), nd you want given initialized object to be reused acros multiple map tasks.

SET mapred.job.reuse.jvm.num.tasks=100;


Monday, March 18, 2013

Port Forwarding on VM (VMWare Fusion)


Use Case:
Usually you spin your VM on local box but sometimes need to share server link with other folks in the office. By forwarding VM port to your local port you can easily share access your machine and have it forward requests to the VM.


1) Edit 
 /Library/Preferences/VMware Fusion/vmnet8/nat.conf

....
# Use these with care - anyone can enter into your VM through these...
# The format and example are as follows:
#<external port number> = <VM's IP address>:<VM's port number>
#8080 = 172.16.3.128:80
8081 = 192.168.242.128:8081
...

2) Apply 
For change to become effective shut down VM then extit VMWare Fusion application, and start your application and VM

Thursday, March 14, 2013

Thursday, February 21, 2013

Maven Java Exec

mvn exec:java -Dexec.mainClass="com.klout.thunder.hive.udf.ExtractDictionaryKeyValuesUDF"  2>&1 |  grep BENCH_DICT

Wednesday, November 7, 2012

Freebase autosuggest

Great jQuery plugin for the auto suggest:

http://wiki.freebase.com/wiki/Freebase_Suggest

# Schema explorer

http://schemas.freebaseapps.com/type?id=/type/property

Saturday, October 13, 2012

Small cross domain handler for the Scala Play

I rarely use scala and play, but occasionally I need to add some dashboard functionality. In this case I wanted to use HBase REST API to get cells from HBase which were encoded as JSON. You could read HBase from Scala client but using restfull is simpler however requires crossdomain calls which is why I had some kind of crossdomain proxy. It may be usefull for you if you use Play/Scala framework. It works but but, use it at your risk.

CrossDomainExample.scala

object CrossDomainExample extends Controller {

  def crossDomain = Action(parse.json) {
    request =>
      request.body match {
        case JsObject(fields) =>
          val jsonMap = fields.toMap
          val url: JsValue = jsonMap("url")
          val acceptOpt: Option[JsValue] = jsonMap.get("accept")
          val acceptValue = acceptOpt match {
            case Some(header) => header.as[String]
            case _ => "application/json"
          }
          val request : String = url.as[String]
          Async{
            for (response <- WS.url(request).withHeaders("Accept" -> acceptValue).get()) yield {
              println("Sending : " + response.body)
              //response
              Ok(response.body).as(acceptValue)
            }
          }
        case _ => Ok("received something else: " + request.body + '\n')
      }
  }
}

// Routes
POST    /crossdomain                  controllers.CrossDomainExample .crossDomain

# Test from Bash 
 curl  --header "Content-type: application/json"  --request POST  --data '{"url": "http://sample-url-that-serves-json.com/getJson?id=123456789", "accept" : "application/json"}'  http://localhost:9000/crossdomain -v


Tuesday, October 9, 2012

Proto buffer and json S(D)erialization


http://code.google.com/p/protobuf-java-format/

From:

Message.Builder builder = SomeProto.newBuilder();
String jsonFormat = _load json document from a source_;
JsonFormat.merge(jsonFormat, builder);

To:

Message someProto = SomeProto.getDefaultInstance();
String jsonFormat = JsonFormat.printToString(someProto)

Tuesday, September 25, 2012

UPDATE / SET on JOIN (MySQL)


UPDATE table1
JOIN table2
ON table1.sourceId = table2.sourceId
SET table1.sourceInfo = table2.sourceInfo;

Thursday, September 13, 2012

\001 and sed madness


# When you need to replace your separator or whatever using utf8 coded char like \001
cat   ~/Desktop/topic.csv  | sed -e "s/_ESCAPE_TAG_/$(echo -e \\001)/g" &> /tmp/t.table